Dataflow Gen2 - Low-Code ETL in Fabric
Dataflow Gen2 – Low-Code ETL in Microsoft Fabric (Architecture, Demo & Best Practices) In this episode of Fabric Friday, we explore Dataflow Gen2 in Microsoft Fabric — a scalable, low-code ETL solution built on Power Query, Spark and OneLake. Dataflow Gen2 keeps the familiar Power BI transformation experience, but adds disk-based staging and Spark execution, allowing you to process large datasets efficiently and reliably. 🔍 What you’ll learn Dataflow Gen2 architecture • How Gen2 differs from Gen1 • Why staging on disk enables large-scale ETL • How Spark and Parquet are used behind the scenes Practical demo • Importing data from an OData source • Transformations with Power Query • Preview, data quality and diagram view • Writing outputs to Lakehouse and Warehouse Multi-destination & incremental refresh • Writing to multiple targets in one run • Bronze / silver / gold layering • Incremental loading using modified timestamps Integration with pipelines • Calling dataflows from pipelines • Separating ETL logic from orchestration Performance & best practices • When to enable staging • Filtering and column reduction • Naming and modular query design ⸻ ⭐ Key Takeaways • Dataflow Gen2 enables enterprise-scale, low-code ETL • Staging + Spark are the foundation for scalability • Multi-destination outputs simplify medallion architectures • Incremental refresh is essential for performance • Pipelines and dataflows work best together 00:00 Matthias is here, your trusted advisor 00:02 Let's talk about data flows and enterprise-grade ETL pipelines without code 00:11 End-to-end architecture overview 00:26 Over 150+ data sources supported (SQL Server, CloudDBs, files, APIs, etc.) 01:02 ETL process: Extract with connectors, Transform with Power Query, Stage to Lakehouse, then copy fast to destination 02:12 Staging explained - Spark-based, fast, scalable with Power Query interface 03:16 Difference between Data Flow Gen 1 and Gen 2 explained (in-memory vs. Spark + staging) 04:26 Gen 2 architecture visualized 05:11 Gen 1 architecture and limitations 05:42 Recommendation to always use Gen 2 with staging enabled by default 06:07 Demo start: create Data Flow Gen 2, import Northwind data with OData connector 07:22 Data preview and transformation basics in Power Query interface 08:01 Data profiling features (histograms, statistics, diagram view) 09:30 Data transformation: grouping, counting orders per customer 10:10 Merging Customers and Orders queries into combined table 11:00 Data column selection and expansion in merged table 14:00 Adding a rank column to data 15:00 Setting up data destinations (Lakehouse, Warehouse, SQL DB) 16:00 Disabling staging option for demo (not recommended for large datasets) 18:50 Comparison of performance staging enabled vs disabled 19:40 Multi-destination pattern in a single data flow (bronze, silver, gold layers) 20:45 Explanation of incremental refresh pattern (need timestamp column, loading changed data only) 23:00 Integration of Data Flow Gen 2 into Pipelines for orchestration and scheduling 24:10 Demo: pipeline creation and running a data flow activity 29:07 Common community questions: 29:22 - When to use Data Flow Gen 2 vs pipelines 29:58 - How to speed up slow data flows (enable staging, filter early, remove columns) 30:29 - Multiple outputs in one data flow possible 30:49 - Learning path (start with Data Flow Gen2 before notebooks) 31:21 Tips & tricks for transformations, naming steps, start small with data sets 32:22 Thanks and closing remarks
Download
0 formatsNo download links available.