Apache Airflow (EN)
ToolOpen-source platform for workflow orchestration
Apache Airflow
Apache Airflow is an open-source platform for workflow orchestration, specifically developed for scheduling, executing, and monitoring complex data pipelines. With Airflow, you can define DAGs (Directed Acyclic Graphs) that model dependencies between tasks and schedule their execution. The platform provides a web UI for monitoring pipeline executions and enables the implementation of retry policies for critical tasks.
Architecture
flowchart TD A[Web UI] --> B[Scheduler] B --> C[Executor] C --> D[Worker Nodes] D --> E[Database] B --> E A --> E F[Metadata Store] --> E
In Context
- Typically used together with Apache Spark, BigQuery, Redshift, and other data processing tools
- Related to: Luigi, Prefect, Dagster, Kubeflow
- Example use case: ETL pipelines in data warehousing environments