Back to Browse

Apache Paimon - Open Table Format built for Real-Time Data Pipelines

164 views
Sep 2, 2025
5:51

#Iceberg added row identifiers (row lineage) in V3. #Paimon was designed with Primary Key support at its core. And it's Iceberg compatible for reads. How? Apache Paimon is an #OpenTableFormat that's designed for #Streaming. It supports Append-only Tables and Primary Key Tables. ▪️▪️ 𝗛𝗼𝘄 𝗣𝗞 𝗧𝗮𝗯𝗹𝗲𝘀 𝘄𝗼𝗿𝗸 Append-only tables are nothing revolutionary, but I want to explain 𝗵𝗼𝘄 𝗣𝗞 𝗧𝗮𝗯𝗹𝗲𝘀 𝘄𝗼𝗿𝗸 - cause it took me a while to grasp. I'll be simplifying a lot to help you understand the core idea. 🧠 𝗠𝗲𝗺𝘁𝗮𝗯𝗹𝗲 𝗮𝗻𝗱 𝗦𝗼𝗿𝘁𝗲𝗱 𝗥𝘂𝗻𝘀 You configure #Flink to write in the Paimon format. Flink stores writes in memory in a memtable, and after a specified period (~1-3 minutes) the memtable gets merged and persisted as a Sorted Run. Primary key values in a Sorted Run never overlap. This Sorted Run gets appended to level 0 of the LSM Tree. Primary key ranges of Sorted Runs on the same level of the tree never overlap (except level 0 which means before the first compaction cycle). 🌳 𝗖𝗼𝗺𝗽𝗮𝗰𝘁𝗶𝗼𝗻 𝗼𝗳 𝘁𝗵𝗲 𝗟𝗦𝗠 𝗧𝗿𝗲𝗲 Compaction process starts at level 0 of the Tree. Levels are processed sequentially, starting from the bottom. During compaction process Sorted Runs on each level get merged to higher, more granular levels. The key ranges of Sorted Runs on the same level of the LSM Tree never overlap. ▪️▪️ Paimon also supports Append-only Tables if you don't want all these compactions. Both PK Tables and Append-only Tables support #Parquet file format, and can be configured to produce Iceberg metadata for compatibility with Iceberg readers. There's much more to Paimon. If you're interested, another core concepts are Buckets and Merge Engines. If you're building or planning to build real-time data pipelines moving huge volume - it's definitely a technology worth understanding. -- The video is just a fragment from a 1.5h Data Pulse episode "𝗦𝘁𝗿𝗲𝗮𝗺 𝗣𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴 𝘄𝗶𝘁𝗵 𝗚𝗶𝗮𝗻𝗻𝗶𝘀 𝗣𝗼𝗹𝘆𝘇𝗼𝘀" hosted by Jan Siekierski and Sachin Tripathi (that joined later): https://youtube.com/live/Vl4Ql3H0tto?feature=share

Download

0 formats

No download links available.

Apache Paimon - Open Table Format built for Real-Time Data Pipelines | NatokHD