Cumulus
Homepage
Docs
←Back to all articles
Tag

#fast-gpu

1 article tagged with "fast-gpu"

#inference4#model-hosting4#serverless-gpu3#cuda2#grace-hopper2#gpu2#pricing2#gpu-cloud2#reinforcement-learning1#fine-tuning1#visual-generation1#pipeline1#mamba1#qwen3.51#linear-attention1
February 17, 20266 min read

Ionattention: Grace Hopper–Native Inference

How Cumulus built the fastest GPU inference runtime for NVIDIA GH200 — 7,167 tok/s on a single chip. Coherent CUDA graphs, eager KV writeback, and phantom-tile attention scheduling.

inferencegpugrace-hopper+6
Read article

Cumulus Labs

© 2026 Cumulus Compute Labs Corporation. All rights reserved.