← All posts
CrewAIStrands SDKAI AgentsData EngineeringPython

CrewAI vs Strands SDK: My 90-Day Production Comparison

Viswanath Nagarajan··5 min read

CrewAI vs Strands SDK: My 90-Day Production Comparison

Last year I got to do something rare: evaluate two competing agentic AI frameworks in a real production environment, on real financial data, with real consequences if something broke.

The context: a multi-agent system to automate legacy code conversion and reduce engineering onboarding from weeks to hours. I ran CrewAI and Strands SDK in parallel for 90 days on the same codebase. This is what I found.

Spoiler: they're not competing. They're complementary. But you need to know when to use which.

The Setup

The use case: a system of agents that could analyze a legacy COBOL/Java codebase, understand its data contracts, generate equivalent Python/PySpark, validate the output, and document the transformation.

This required:

  • Long-running tasks (analysis could take 30+ minutes)
  • Tool use (file system, SQL queries, AWS API calls)
  • Agent memory across a multi-step workflow
  • Graceful degradation when an agent got confused

CrewAI: Strengths

CrewAI's mental model is intuitive if you think in teams. You define agents with roles, goals, and backstories — then define tasks and let the crew coordinate.

from crewai import Agent, Task, Crew

analyst = Agent(
    role='Legacy Code Analyst',
    goal='Understand COBOL data contracts and business logic',
    backstory='You are an expert in legacy financial systems...',
    tools=[file_reader, schema_extractor],
    verbose=True,
)

converter = Agent(
    role='Python Migration Engineer',
    goal='Convert legacy code to idiomatic PySpark',
    backstory='You write clean, tested, documented Python...',
    tools=[code_generator, test_runner],
)

analyze_task = Task(
    description='Analyze the COBOL program and extract all data contracts',
    agent=analyst,
    expected_output='JSON schema of all inputs/outputs',
)

convert_task = Task(
    description='Convert the analyzed program to PySpark',
    agent=converter,
    context=[analyze_task],
    expected_output='Tested PySpark module with docstrings',
)

crew = Crew(agents=[analyst, converter], tasks=[analyze_task, convert_task])
result = crew.kickoff()

What CrewAI does well:

  • Role-based agent definition is natural and readable
  • Task context passing works reliably — agents genuinely use outputs from prior tasks
  • The process model (sequential vs hierarchical) is flexible enough for most workflows
  • Strong community; lots of examples for data engineering use cases

Where it struggled:

  • Long-running tasks (30+ min) would occasionally lose context in the middle
  • Tool error handling required a lot of custom wrapping — uncaught tool exceptions could derail the entire crew
  • Memory persistence across sessions required significant custom infrastructure
  • The verbose output, while helpful for debugging, was noisy in production logs

Strands SDK: Strengths

Strands SDK (Amazon's framework) takes a different philosophical position: agents are defined by their tools, not their roles. The tool definitions are first-class citizens.

from strands import Agent
from strands.tools import tool

@tool
def extract_schema(file_path: str) -> dict:
    """Extract data contract schema from a legacy file."""
    # implementation
    return schema

@tool
def generate_pyspark(schema: dict, business_logic: str) -> str:
    """Generate validated PySpark code from schema and logic description."""
    # implementation
    return pyspark_code

agent = Agent(
    model="anthropic.claude-3-5-sonnet-20241022-v2:0",
    tools=[extract_schema, generate_pyspark],
    system_prompt="You are a data migration specialist..."
)

result = agent("Convert the COBOL program at /path/to/file.cbl to PySpark")

What Strands does well:

  • Tool definition via Python decorators is clean and type-safe
  • AWS Bedrock integration is seamless — critical for our AWS-native stack
  • Agents are stateless by default, which is actually a feature for our use case (idempotent runs)
  • Error handling at the tool level is much cleaner — exceptions stay contained
  • Streaming output works out of the box

Where it struggled:

  • Multi-agent coordination requires more manual orchestration — no built-in "crew" concept
  • Less community content (it's newer)
  • The lack of built-in agent memory meant we had to build our own persistence layer

The Decision Framework

After 90 days, here's how I think about it:

| Use Case | Winner | |----------|--------| | Multi-agent workflows with clear roles | CrewAI | | AWS-native, tool-heavy single agents | Strands | | Long-running stateful analysis | CrewAI (with custom memory) | | Idempotent, repeatable tasks | Strands | | Rapid prototyping | CrewAI | | Production reliability on AWS | Strands |

What We Actually Shipped

The answer: both. The outer orchestration layer uses CrewAI (a Crew of 4 agents: Analyzer, Converter, Validator, Documenter). Each individual agent, when it needs to call AWS services or execute tools reliably, uses a Strands agent internally.

Think of it as: CrewAI for the workflow, Strands for the tool execution.

The result: onboarding time dropped dramatically. New engineers describe what they need in plain English. The crew handles the rest.

What's Next

This architecture directly informed AgentFlow — my open-source framework for agentic ETL pipelines. The same CrewAI + Strands composition pattern, applied to data pipeline orchestration instead of code conversion.

If you're building multi-agent systems for data engineering, I'd start with CrewAI for the workflow design and layer in Strands (or LangChain tool definitions) for the heavy lifting. Don't pick one — understand what each does best.


Questions? Disagreements? I'm building AgentFlow in public and posting weekly on LinkedIn. Hit me there.

Found this useful?

Share it on LinkedIn to help other data engineers — and follow for weekly posts on AI-native data engineering.