We are building AI inference infrastructure — not just a faster SSD.

AI scaling has hit a wall — not compute, but memory. We’re building HBF, a new memory layer between HBM and SSD that delivers near-HBM latency with multi-terabyte capacity. This unlocks ~3x more models per GPU and significantly lowers cost per inference.

2controllers completed in the first working prototype
32+controllers planned in the next 3D stacked architecture
1.6+ TB/starget bandwidth for next-generation HBF
Overview

Our technology lowers cost per token

GPU memory usage ↓ 30–60%, Models per node ↑ 3–5x, Context length ↑ 10x+

LLM inference (weight streaming)

Streams model weights on demand from HBF, removing the need to fully reside in GPU memory.

KV cache scaling

Offloads KV cache to HBF with deterministic access, enabling long-context inference without GPU memory explosion.

MoE expert loading

Dynamically loads experts from HBF in real time, allowing sparse models to scale without memory constraints.