6 min read
Ionattention: Grace Hopper–Native Inference
How Cumulus built the fastest GPU inference runtime for NVIDIA GH200 — 7,167 tok/s on a single chip. Coherent CUDA graphs, eager KV writeback, and phantom-tile attention scheduling.
inferencegpugrace-hopper+6
Read article