Gemma 4: Why Google's Licence Change Matters More Than the Benchmarks
Google DeepMind released Gemma 4 under Apache 2.0 — the first time a Gemma model has shipped with a truly permissive commercial licence. The 31B model ranks third among all open models, but the licence shift is what actually unlocks enterprise adoption.

Google DeepMind dropped Gemma 4 today, and if you skimmed the coverage you might have filed it under "another open-weight model release." That would be a mistake.
Yes, the performance numbers are impressive. Yes, the 26B Mixture-of-Experts model supposedly outcompetes models twenty times its size. Yes, the 31B dense model sits third on the global open-model leaderboard. But the most significant thing about Gemma 4 has nothing to do with benchmarks.
It is the licence.
The Licence Problem Nobody Talked About
For two years, the open-source AI community had a frustrating relationship with Google's Gemma models. They were technically excellent — often best-in-class for their parameter count. But the custom Google licence that shipped with every version created a layer of legal ambiguity that enterprise teams simply could not stomach.
The terms contained "Harmful Use" carve-outs that required interpretation. Google retained the right to update them unilaterally. Using Gemma in a commercial product meant a legal review — sometimes multiple rounds — that turned what should have been a technical decision into a procurement process. Most teams just went with Llama or Qwen instead and never looked back.
Gemma 4 ships under Apache 2.0.
Same licence as Mistral. Same licence as Qwen. Same licence as most of the models your team is probably already using. No restrictions on commercial deployment, redistribution, fine-tuned derivatives, or sublicensing. No legal review needed. You build with it, you ship with it, done.
VentureBeat called this change out directly: the licence shift "may matter more than benchmarks." They are right. For any business that was evaluating Gemma 3 and concluded the legal friction was not worth it — that friction is gone.
What Gemma 4 Actually Is
Four models, two deployment profiles.
Edge models — built for on-device:
- E2B (Effective 2B parameters): image, video, and audio input; 128K context; built for Android and low-power devices
- E4B (Effective 4B parameters): same multimodal capability, slightly more headroom; 128K context
Cloud/workstation models — built for serious inference:
- 26B Mixture of Experts: 256K context; image and video input; the efficiency story — active parameter count is a fraction of total, which is why it punches above its weight class
- 31B Dense: 256K context; image and video input; sits third on the global open-model arena leaderboard at time of release
"Effective" in the edge model names refers to active parameters during inference — MoE-style selective activation that preserves RAM and battery life without sacrificing output quality. On a phone, this is the difference between a tool that is practical and one that drains your battery and heats your hand.
All four models are natively multimodal from the ground up, not retrofitted. The vision encoder supports variable aspect ratios. The audio architecture borrows from Google's USM conformer work. Documents, screenshots, charts, handwriting, multilingual OCR — all handled without additional tooling.
The Agentic Piece
This is where it gets genuinely interesting for anyone building AI workflows.
Gemma 4 ships with native agentic capabilities built in — function calling, structured JSON output, multi-step planning, and tool use. Not as add-ons. Not requiring fine-tuning. Out of the box.
For context: most open models need additional training or prompt engineering gymnastics to reliably call functions and return structured output. Gemma 4 treats this as a baseline capability. You can drop it into an agent pipeline — using n8n, LangChain, or a custom implementation — and it will handle tool use correctly without special treatment.
The 256K context window on the large models means you can pass an entire codebase, a long contract, or a multi-document research brief in a single prompt. Combined with native function calling, this makes the 31B model a credible foundation for complex autonomous workflows on private infrastructure.
And that matters for a specific category of business that the cloud models do not serve well: organisations where data cannot leave the building. Healthcare providers with patient records. Law firms with client confidentiality obligations. Financial services with data residency requirements. Gemma 4, running on-premises on AMD or NVIDIA hardware with day-0 optimisation support from both vendors, gives these organisations genuine frontier-class reasoning without a cloud dependency.
The Benchmark Reality
Let us be honest about where Gemma 4 sits competitively.
The 31B model ranks third among all open models on the LMSYS Arena leaderboard. It scores 89.2% on AIME 2026 (mathematics), 84.3% on GPQA Diamond (science), and 80.0% on LiveCodeBench v6 (coding).
Strong numbers. Not the frontier.
The honest competitive picture is that Chinese open-weight models have surged. Qwen 3.5 from Alibaba, GLM-5 from Zhipu AI, and Kimi K2.5 from Moonshot AI all edge out Gemma 4 31B on top benchmarks. Qwen has overtaken Llama as the most-used self-hosted model globally — a remarkable shift from 1.2% of open-model usage in late 2024 to around 30% by end of 2025.
Gemma 4 is Google re-entering a race it is not yet leading. But Google's advantages are distribution, hardware partnerships, developer tooling, and Android integration — none of which appear on a benchmark table.
For most practical business use cases — document processing, customer service automation, coding assistance, content generation — the 31B model is more than capable. The question of whether it scores 89% or 91% on a maths olympiad is not the question your finance team cares about.
Who Should Actually Pay Attention
Developers building on Android. The E2B and E4B models are integrated into Android's AICore Developer Preview. This is Google making on-device LLM inference a standard Android primitive — not a research project. If you are building mobile applications and you want AI that works offline, does not send data to a cloud, and runs on the device your users already carry, Gemma 4 is the most practical option available today.
Teams that self-host for compliance. Apache 2.0 plus on-prem hardware from NVIDIA (RTX to Blackwell) and AMD (Instinct GPUs, Radeon workstations, Ryzen AI PCs) means Gemma 4 is deployable on your own infrastructure with first-class vendor support. NVIDIA and AMD both published optimisation blogs on day zero. This is not community support. It is commercial readiness.
Agent builders who need reliable function calling. If you are assembling AI workflows — whether in n8n, a custom Python stack, or a commercial automation platform — Gemma 4's native structured output and function-calling capability is meaningfully better than retrofitted alternatives. Fewer prompt engineering workarounds. More predictable output. Faster iteration.
Multilingual products. 140+ languages natively trained, not translated. If you are building for non-English markets, or serving a diverse customer base, this is one of the most capable open-weight multilingual models available.
Sovereign AI initiatives. Governments and regulated entities building national AI infrastructure need models that are commercially licenced, offline-capable, and deployable on domestic hardware. Gemma 4 checks all three boxes in a way that previous Gemma versions did not.
The Context That Is Worth Keeping
Three things are happening simultaneously that make this release more significant than a normal model update:
First, Google has joined the permissive open-weight ecosystem properly for the first time. Meta, Mistral, Alibaba, and now Google are all shipping commercially-usable open models. This raises the quality floor for the entire category and makes "open source AI" a genuine enterprise option rather than a hobbyist one.
Second, the MoE efficiency story continues to accelerate. The 26B model competing with 500B+ parameter models is not a fluke. Architecture innovation — not just scaling — is now the primary competitive lever in AI development. This is good for businesses because it means capable models that run on affordable hardware, not just on data centre clusters.
Third, on-device AI is becoming real. The combination of Gemma 4 E2B/E4B, Android AICore integration, and Qualcomm/MediaTek hardware acceleration means that the gap between cloud AI and on-device AI is narrowing faster than most people expected. For applications where latency and privacy matter — which is most business applications — this trajectory is significant.
The Practical Question
If you are currently running Llama 3.3 or Qwen 2.5 in your self-hosted infrastructure, Gemma 4 is worth a serious evaluation. The Apache 2.0 licence removes the historical blocker. The hardware support means you will not be configuring undocumented workarounds. The agentic capabilities are production-ready.
If you have been evaluating whether to bring AI inference in-house — whether for cost, compliance, or control reasons — Gemma 4 makes that decision more tractable than it was yesterday.
If you are an Android developer who has been waiting for on-device LLM inference to be stable enough to build on, the AICore Developer Preview is worth your time this month.
The headline claim from Google is "byte for byte, the most capable open models." Whether that is precisely true is a matter for benchmark enthusiasts. What is not in dispute is that Gemma 4 is a commercially viable, production-ready, legally clean open-weight model family — and that combination did not exist from Google six months ago.
That is the real news.
Enjoyed this article?
Subscribe to our Weekly AI Digest for more insights, trending tools, and expert picks delivered to your inbox.