Mercury
World's first commercial diffusion LLM — 1,000+ tokens per second
About
Mercury by Inception Labs is the first commercial-scale diffusion large language model, fundamentally changing how AI generates code and text. Rather than predicting tokens one at a time, Mercury generates a coarse draft and then refines it in parallel across multiple tokens simultaneously, reaching 1,000+ tokens per second on NVIDIA H100s—five times faster than speed-optimized autoregressive models. Mercury 2, launched February 2026, adds full reasoning capabilities while maintaining sub-two-second latency. On Copilot Arena, Mercury Coder Mini ties for second place, outperforming GPT-4o Mini and Gemini Flash. Ideal for teams with high-throughput coding needs, CI pipelines, or latency-sensitive IDE integrations where response speed directly affects developer flow.
Key Features
Integrations
Reviews
No reviews yet. Be the first to share your experience.
Related Reading
CrewAI Hit 47.8K Stars and 2 Billion Agent Runs — The Multi-Agent Question You Can't Keep Dodging
Spec-Driven AI Coding — Why Kiro's 'Describe First' Workflow Breaks the Cursor Pattern
Windsurf in 2026 — The AI Code Editor That Cursor Should Be Worried About
More in Developer Tools
View allAgentic AI for test generation, code review, and security analysis
AI coding assistant with codebase context
AI-powered code snippet manager and workflow copilot
AI pair programmer that writes code with you
AI-powered UI generation from text descriptions
Spec-driven agentic IDE for shipping production-ready code