Back to Browse

Scaling Intelligence Through the Memory Hierarchy with Solidigm

25 views
May 21, 2026
1:01:07

Solidigm's presentation at AI Field Day 8, led by Kapil Karkra, highlighted memory capacity as a critical, often overlooked, third axis for scaling AI intelligence, alongside model size and compute power. Solidigm introduced its "CRAFT" framework to define and measure AI intelligence across five dimensions: Comprehension, Recall, Adaptability, Fluency, and Tenacity. The core argument is that expanding memory capacity beyond the GPU's high-bandwidth memory (HBM) to system DRAM and NVMe SSDs dramatically improves AI performance and quality by enabling more efficient inference and preventing costly recomputations. Through various benchmarks and experiments, Solidigm demonstrated the impact of memory capacity on each CRAFT dimension. For Recall, offloading Key-Value (KV) cache to SSDs prevented the GPU from recomputing previous states, significantly boosting throughput. Tenacity was illustrated with an AIME 2024 math test, where increased output token capacity allowed the model to deliberate longer and achieve a higher score, showcasing how more "scratch space" leads to better reasoning quality. Adaptability, measured by requests per second, and Fluency, indicated by inter-token latency, both saw substantial improvements (up to 4x throughput and 21x better latency) when NVMe SSDs extended the KV cache, allowing the system to handle more concurrent requests without compromising responsiveness. Similarly, Comprehension, tested with a "needle in a haystack" benchmark, showed 78 times faster reading when context fit in the extended cache. The presentation concluded that while higher bandwidth storage is beneficial when working sets fit within faster tiers, ultimately, sheer capacity becomes paramount for larger, more complex AI workloads involving multiple agents and extensive context lengths. The discussion emphasized the need for a tiered memory hierarchy, where automatic caching across HBM, DRAM, and NVMe SSDs optimizes resource utilization and avoids GPU stalls. This approach allows organizations to balance performance and cost effectively, ensuring that AI systems can sustain deeper reasoning, handle greater concurrency, and deliver higher quality, more fluent responses by leveraging expanded memory capacity. Presented by Kapil Karkra, Sr. Principal Engineer AI Solutions and Software, Solidigm. Recorded live at AI Field Day 8 in San Jose, California on May 14, 2026. Watch the entire presentation at https://techfieldday.com/appearance/solidigm-presents-at-ai-field-day-8/or visit https://TechFieldDay.com/event/aifd8/ or https://Solidigm.com for more information.

Download

0 formats

No download links available.

Scaling Intelligence Through the Memory Hierarchy with Solidigm | NatokHD