This allows for writing code that instantiates pipelines dynamically. Dynamic: Airflow pipelines are configuration as code (Python), allowing for dynamic pipeline generation. For high-volume, data-intensive tasks, a best practice is to delegate to external services specializing in that type of work.Īirflow is not a streaming solution, but it is often used to process real-time data, pulling data off streams in batches. Other similar projects include Luigi, Oozie and Azkaban.Īirflow is commonly used to process data, but has the opinion that tasks should ideally be idempotent (i.e., results of the task will be the same, and will not create duplicated data in a destination system), and should not pass large quantities of data from one task to the next (though tasks can pass metadata using Airflow's XCom feature). When the DAG structure is similar from one run to the next, it clarifies the unit of work and continuity.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |