CASE STUDY

Furrballs (NUMA-Aware Caching Library)

A NUMA-aware caching library using topology as a first-class input to cache placement and eviction decisions, with a 5-step ablation study and published whitepaper.

Challenge

Modern multi-socket servers expose NUMA topology where remote memory access can be 1.2-2x slower than local. Existing caching systems (Redis, Memcached) ignore NUMA entirely; CacheLib uses static coarse-grained sharding. No published system combines NUMA topology as an adaptive input to per-page cache placement with an eviction policy.

Approach

Built a C++20 library with per-node physical block allocation, PMR-backed containers, bump-packed multi-value pages, and per-node sharded KeyStore. Implemented lock-free reads via SeqLock to eliminate synchronization overhead from the read path. Designed round-robin and thread-local key routing strategies. Created a 5-step ablation study isolating each architectural decision, a shared-nothing MPSC queue variant, simulated NUMA latency injection, and a cross-VM baseline isolation methodology. Published a technical whitepaper with DOI on Zenodo.

Outcome

Demonstrated that SeqLock lock-free reads expose 2.2x more NUMA signal than shared_mutex (11.7% vs 5.1% p50 cross-node overhead). Thread-local routing with SeqLock achieves 26-41% improvement over round-robin. Per-node sharding provides 3x concurrent Set throughput via lock partitioning. Ablation study isolates each design decision's contribution. Shared-nothing variant defines break-even at ~21 cache misses per operation on Xeon hardware.

Whitepaper Start a Project