Back to Browse

Why the Linux Kernel Doesn't Use Modulo (%) | High-Perf Queues

3.3K views
Jan 20, 2026
37:55

Why the Linux Kernel Doesn't Use Modulo (%) | High-Perf Queues In Part 1, we built a queue that was easy to read using the modulo operator. It worked, but it was slow. Today, we're going to delete that code. We're going to use the Power-of-Two rule and bitwise math to build a branchless queue the exact same way the Linux Kernel handles its data structures. discord: https://discord.codotaku.com code: https://github.com/CodesOtakuYT/codotaku_ds In computer science, a queue is an abstract data type that serves as an ordered collection of entities. By convention, the end of the queue where elements are added, is called the back, tail, or rear of the queue. The end of the queue where elements are removed is called the head or front of the queue. The name queue is an analogy to the words used to describe people in line to wait for goods or services. It supports two main operations. Enqueue, which adds one element to the rear of the queue Dequeue, which removes one element from the front of the queue. Other operations may also be allowed, often including a peek or front operation that returns the value of the next element to be dequeued without dequeuing it. The operations of a queue make it a first-in-first-out (FIFO) data structure as the first element added to the queue is the first one removed. This is equivalent to the requirement that once a new element is added, all elements that were added before have to be removed before the new element can be removed. A queue is an example of a linear data structure, or more abstractly a sequential collection. Queues are common in computer programs, where they are implemented as data structures coupled with access routines, as an abstract data structure or in object-oriented languages as classes. A queue may be implemented as circular buffers and linked lists, or by using both the stack pointer and the base pointer. Queues provide services in computer science, transport, and operations research where various entities such as data, objects, persons, or events are stored and held to be processed later. In these contexts, the queue performs the function of a buffer. Another usage of queues is in the implementation of breadth-first search. 00:00 Introduction: Why the Linux Kernel avoids modulo (%) in high-perf queues. 01:34 Overview of the codotaku_ds header-only library. 02:48 Setting up the environment and NeoVim workflow. 03:44 Visualizing the current circular queue implementation. 04:12 The Optimization: Replacing the modulo operator with bitwise AND. 05:30 The Power of Two requirement and how to check for it using bitwise logic. 08:00 Performance Tip: Storing a mask instead of capacity to reduce operations. 10:07 Why accessor functions (like q_capacity) are vital for API stability. 11:35 Hardware comparison: CPU cycles of modulo vs. bitwise operators. 14:35 Handling the three cases for calculating queue element count. 18:56 The naive approach to counting elements in a wrapped-around queue. 21:40 Using bitwise math to solve the "wrap-around" count problem. 30:42 Branchless programming: Eliminating if/else to prevent CPU stalls. 32:13 Implementing a helper function for queue distance. 37:38 Conclusion and teaser for Part 3.

Download

1 formats

Video Formats

360pmp438.8 MB

Right-click 'Download' and select 'Save Link As' if the file opens in a new tab.

Why the Linux Kernel Doesn't Use Modulo (%) | High-Perf Queues | NatokHD