SFT and Online RL for Visual Generation: How We Built CoSprite's Training Pipeline
How Cumulus built a production pipeline for consistent AI-generated game previews using best-of-N sampling, deterministic rendering, pairwise judging, supervised fine-tuning, and online reinforcement learning with GRPO.