I Replaced My Airflow DAG with 3 Lines of English

After four years of writing data pipelines — Airflow DAGs, custom ETL, serverless orchestration on AWS Lambda and Step Functions — I kept running into the same wall.

DAGs don't describe what you want. They describe what you wrote.

When a pipeline fails at 2am, you're not debugging intent. You're debugging code. And most of the time, the intent was simple: "take this data, clean it, load it, and tell me if something looks wrong."

So I built AgentFlow.

The Problem with DAGs

Here's a real Airflow DAG I inherited — simplified, but representative:

from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime, timedelta

default_args = {
    'owner': 'data-team',
    'retries': 3,
    'retry_delay': timedelta(minutes=5),
    'email_on_failure': True,
}

with DAG('customer_etl', default_args=default_args, schedule_interval='@daily') as dag:
    extract = PythonOperator(task_id='extract', python_callable=extract_customers)
    validate = PythonOperator(task_id='validate', python_callable=validate_schema)
    transform = PythonOperator(task_id='transform', python_callable=transform_records)
    load = PythonOperator(task_id='load', python_callable=load_to_warehouse)
    notify = PythonOperator(task_id='notify', python_callable=send_slack_alert)

    extract >> validate >> transform >> load >> notify

200+ lines across 6 files. When a schema changed upstream, I'd spend hours tracing which task needed updating, why the retry logic wasn't catching a specific error type, and why the notification wasn't firing.

The DAG told Airflow how to run. But I had to manually encode what I wanted.

The AgentFlow Approach

The same pipeline in AgentFlow:

from agentflow import Pipeline

pipeline = Pipeline.from_goal("""
    Extract customer records from PostgreSQL daily.
    Validate against the expected schema and flag anomalies.
    Load clean records to Redshift. Alert on Slack if error rate exceeds 2%.
""")

pipeline.deploy()

That's it.

AgentFlow uses a multi-agent system under the hood — built on top of CrewAI and the Strands SDK — to:

Parse intent from the natural language goal
Generate a typed pipeline plan with explicit steps, validation rules, and error thresholds
Execute with self-healing — if a step fails, an agent diagnoses the failure, attempts a fix, and retries before escalating
Auto-document — generates human-readable runbooks for every pipeline it runs

Why This Matters at Scale

Working with high-volume financial data at scale taught me three things about pipeline reliability:

Onboarding is the bottleneck. New engineers spend more time understanding existing DAGs than building new ones. When you describe intent instead of implementation, that cost drops dramatically.
Transient failures dominate incident logs. Most 2am pages are retryable errors that a smarter retry strategy would have caught. Self-healing agents handle this automatically.
Documentation rots. Because it's written after the fact. AgentFlow generates it from execution — it stays current by default.

The Architecture

┌─────────────────────────────────────────┐
│           AgentFlow Core                │
│                                         │
│  ┌───────────┐    ┌───────────────────┐ │
│  │  Planner  │───▶│  Execution Agents │ │
│  │  Agent    │    │  (per step)       │ │
│  └───────────┘    └───────────────────┘ │
│         │                  │            │
│         ▼                  ▼            │
│  ┌───────────┐    ┌───────────────────┐ │
│  │  Schema   │    │  Healer Agent     │ │
│  │  Validator│    │  (on failure)     │ │
│  └───────────┘    └───────────────────┘ │
└─────────────────────────────────────────┘
         │
         ▼
   AWS / Airflow / Prefect
   (AgentFlow deploys to your existing infra)

AgentFlow doesn't replace your infrastructure. It sits in front of it. The Planner Agent converts your goal into a structured execution plan. Each step runs via an Execution Agent that understands the semantics of what it's doing — not just the code.

Current Status

AgentFlow is in early alpha. The GitHub repo has the core engine working with AWS Lambda + Step Functions and Apache Airflow backends. Prefect and Dagster backends are next.

If you've ever stared at a broken DAG at 2am wondering why the retry logic is firing for the wrong exception type, I'd love your feedback.

Star the repo. Break it. Tell me what's missing.

The goal is simple: pipelines should understand intent, not just instructions.

Viswanath Nagarajan is a Data Platform Engineer building AgentFlow in public. Follow on LinkedIn for weekly data engineering + AI posts.