This Data Structure Makes Postgres Inserts Fast
Every time you INSERT a row, Postgres has to find a page with enough free space - across a table that could have millions of pages. How does it do this fast? The answer is a clever data structure called the Free Space Map — a 3-level max-heap stored on disk, with a round-robin search to keep inserts balanced. In this video, we build up the FSM from first principles: what properties we need, why a max-heap fits, and how Postgres extends it to handle billions of pages while reading only 3 pages per search. Relevant Links: 1. README: https://github.com/postgres/postgres/blob/master/src/backend/storage/freespace/README 2. Round-Robin Based Search: https://github.com/postgres/postgres/blob/master/src/backend/storage/freespace/fsmpage.c 3. Page Level Traversal: https://github.com/postgres/postgres/blob/master/src/backend/storage/freespace/freespace.c 4. Dockerfile: https://github.com/thewalstreetjournal/postgres-playlist/blob/main/003-fsm-data-structure/Dockerfile PostgreSQL Internals This video is the third video in the PostgreSQL Internals series. The other videos could be found here: https://www.youtube.com/playlist?list=PLNdSdxHvxQhhlXcS_NGQ1rmMTJ25Asdvn Timestamps 0:00 - Introduction 0:40 - Wishlist from FSM 3:00 - Max Heap as FSM 8:34 - 3-Level Max-Heap 12:37 - Terminal Overview --- The following points are deliberately left out from the video, to keep it simple: 1. Postgres doesn't store free space exactly. It quantizes it. For instance, for an 8KB page, we need 13 bits. However, postgres uses only 8. This results in some loss of acccuracy, but given the use case, this is considered fine. 2. The FSM is not WAL-logged. It's treated as hint data. If Postgres crashes and the FSM is slightly wrong after recovery, that's fine — VACUUM will correct it. 3. The FSM is updated mostly by a background process (VACUUM), to keep writes fast. If during an insert, the writer finds the page full due to stale data/other writes, then the writer also updates the FSM page- but this is strictly when required. That means, the FSM file can be stale and contain incorrect data. The writers are expected to retry if the insert fails due to this inaccuracy. 4. Pages in Postgres are configurable. So depending on the size of the page set, the quantization math and the loss of accuracy might change. The depth of the FSM page tree is a compile time constant and it's fixed at 3. It's reasonably large for almost all cases. ---
Download
0 formatsNo download links available.