Back to Browse

Data Warehouse Ingestion Patterns with Apache NiFi

6.4K views
Apr 12, 2024
21:46

This video talks through the pros and cons of three patterns you can use in Apache NiFi to ingest data into a table created with the Iceberg format. - 1st option: PutIceberg Simply push data using the PutIceberg processor. Super efficient but really only does inserts of new data into the table. It may not be a fit in all cases. - 2nd option: PutDatabaseRecord Great option that is a bit more generic than the previous one if the destination is not an Iceberg formatted table. In this case the data is sent over JDBC. Great for small datasets but won't be super efficient for huge datasets. - 3rd option: Staging area with external temporary tables A bit more involved in terms of flow design but more reliable and very flexible while very efficient as it delegates most of the work to the query engine. In this case data is pushed into a staging area of the object store, you create an external table on top of it, then merge the data from that external table into your final table, and do some cleanup. Thanks for watching the video! As always, feel free to ask comments and share your feedback. And let me know what you'd like to see for the next video!

Download

1 formats

Video Formats

360pmp433.6 MB

Right-click 'Download' and select 'Save Link As' if the file opens in a new tab.

Data Warehouse Ingestion Patterns with Apache NiFi | NatokHD