AI News8 min

Claude Opus 4.7 Is Here — Here's What Actually Changed

Anthropic dropped Claude Opus 4.7 today. Same price as 4.6. 13% better on coding benchmarks. 3× more production task resolutions on Rakuten's SWE-Bench. 90.9% on BigLaw Bench. A quiet release with loud numbers — here's what UK teams should know before they upgrade.

Erhan Timur16 April 2026Founder, Digital by Default

Share:X LinkedIn

Claude Opus 4.7 Is Here — Here's What Actually Changed

Anthropic dropped Claude Opus 4.7 today with the kind of release notes that look small on the surface and significant once you read the benchmarks.

No dramatic new UI. No flashy agent demo. Just a model that's meaningfully better at the things you're already using Claude for — writing code, running agents, reading complex documents — at exactly the same price.

If you're using Opus 4.6 via the API or inside Cursor, Claude Code, Zed, Replit Agent, or one of a dozen other tools, your upgrade path is one identifier change away. But the substance of this release is worth understanding before you flip the switch, because some of the improvements are big enough to change how you deploy Claude in production, and at least one change (the tokenizer) might break existing assumptions about your cost model.

Here's what actually shipped.

The headline: a better coding model, not a new one

Anthropic's pitch for Opus 4.7 is "complex, long-running tasks with rigor and consistency." Translated: it finishes what it starts, it doesn't wander off, and it reads instructions more literally than any previous Claude.

The coding numbers back this up.

On a 93-task internal coding benchmark, Opus 4.7 is 13% better than 4.6 and solves four tasks that neither 4.6 nor Sonnet 4.6 could crack at all. CodeRabbit — the AI code-review platform — reports recall improvements over 10% with stable precision, which in plain English means Opus 4.7 is catching more real bugs without generating more false positives. Rakuten ran it through their fork of SWE-Bench and got 3× more production task resolutions than on 4.6.

3× is the number that gets tossed around in marketing decks and usually means nothing. In this case it means something: Rakuten is measuring whether the model can fix a real bug in a real codebase and produce a PR that passes review. Tripling that rate is the difference between "helpful assistant" and "you can actually delegate this."

Teams building coding agents should take note. The models underneath products like Cursor, Claude Code, Replit Agent, Windsurf, and the long tail of coding copilots quietly got better today — sometimes dramatically, if the tool has flipped its default model. If you've been frustrated by an agent that loses the plot on a 20-minute task, the frustration level is about to drop.

Vision got a serious upgrade

One of the most under-sold parts of the release is the vision improvement. Opus 4.7 handles images up to 2,576 pixels on the long edge — about 3.75 megapixels, roughly 3× the resolution ceiling of previous Claude models.

Why does that matter? Two reasons.

First, computer-use agents. If you're building anything that takes screenshots of a browser or desktop and asks Claude to reason about them, you've been fighting resolution limits for months. High-density UIs — financial dashboards, CRMs, cloud consoles — had to be cropped or downsampled before Claude could read them, which meant the agent regularly missed buttons, misread numbers, or confidently clicked the wrong thing. That gap just closed.

Second, specialist domains. Anthropic explicitly called out life-sciences patent workflows, which is code for "our users were extracting chemical structures and technical diagrams and it wasn't reliable enough." It is now. If your workflow involves parsing engineering schematics, medical imagery, lab notes, or dense technical PDFs — categories where resolution is a feature, not a nice-to-have — Opus 4.7 is a step change.

Instruction-following got sharper (and that might break your prompts)

Opus 4.7 interprets instructions "more literally." Anthropic is upfront about the tradeoff: prompts that worked fine on older models can now produce unexpected results, because the model is doing more precisely what you asked instead of what it guessed you meant.

If you have a production prompt that's been working for a year, it might start behaving differently tomorrow. That's not a bug — it's a sign the model is reading you more carefully. But it does mean you should regression-test any prompt you care about before rolling 4.7 out broadly.

This is, counterintuitively, good news for agent builders. The single biggest source of failure in multi-step agent workflows has always been models drifting off-script. A model that follows instructions more literally is easier to build reliable agents on top of — which is exactly where the Rakuten 3× number is coming from.

The numbers that should move your roadmap

Four industries got specific callouts:

Finance. 0.813 on Anthropic's General Finance evaluation, up from 0.767 on Opus 4.6. Combined with the vision upgrade, this is the category where the step-change is likely biggest in day-to-day work — a lot of finance is spreadsheets, charts, and PDFs you want parsed reliably.

Legal. Harvey — used by a lot of Magic Circle firms in London — reports 90.9% accuracy on BigLaw Bench at high effort. That's a genuine step toward "junior associate work, done reliably." UK law firms already piloting Harvey, Robin AI, or Legora are about to see noticeably better output this week.

Research agents. Top overall score (tied) of 0.715 across six modules in Anthropic's research-agent benchmarks. If you're building anything that reads a lot of source material and synthesises it — competitive intelligence, due diligence, market research — this is your model.

Document reasoning. 21% fewer errors than previous versions on long-document tasks. Any workflow that's been frustrated by Claude hallucinating on lengthy PDFs should revisit.

For UK SMEs, the legal and finance numbers are the most interesting signal. Specialist AI tools targeting those industries are going to get quietly better this month as their underlying models update. If you're buying those tools, your ROI just improved without you doing anything.

The catches

Three things to know before you upgrade:

The tokenizer changed. Opus 4.7 uses a new tokenizer that's more efficient on some text types and less efficient on others. Identical inputs can cost 1.0× to 1.35× more tokens than on 4.6. Anthropic says net token usage trends favourable on coding workloads across effort levels, but your mileage on other workloads may vary. Track it. If you have a high-volume production workload, run a comparison over a weekend before committing.

A new effort level. Opus 4.7 introduces an "xhigh" effort setting sitting between "high" and "max" — finer-grained control over the reasoning-vs-latency tradeoff. Useful if you've been bouncing between "too slow" and "not thorough enough" on your current defaults.

Task budgets, in public beta. Developers can now set token budgets to guide spend across longer runs. If you're deploying agents that sometimes get chatty, this is the lever you wanted.

Cybersecurity safeguards. The model now automatically detects and blocks high-risk cybersecurity requests. Legitimate pen-testers and vulnerability researchers have to join Anthropic's Cyber Verification Program to unlock that class of requests. Not a concern for most users, but worth flagging if you work in security.

Pricing and availability

Same price as Opus 4.6: $5 per million input tokens, $25 per million output tokens. No tier change, no surprise uplift.

Available everywhere:

Claude products and the Claude API (model identifier: `claude-opus-4-7`)
Amazon Bedrock
Google Cloud Vertex AI
Microsoft Foundry

Claude Code users get a new `/ultrareview` command for dedicated code-review sessions, with three free sessions bundled for Pro and Max subscribers.

Should you upgrade today?

If you're running coding agents, research agents, or computer-use agents — yes, after a regression test on your key prompts. The improvements are substantial enough to justify the switch, and the API identifier change is trivial.

If you're using Claude primarily for writing, customer-facing content, or simple Q&A — there's no rush. Opus 4.6 still works fine for those use cases; the headline improvements in 4.7 target complex, long-running work.

If you're a vendor whose product relies on Claude — update your default model. Your users will get better output without knowing you changed anything, and the benchmarks suggest nobody's going to complain about the swap.

The quiet release style matches the upgrade. This isn't Anthropic launching a new category. It's Anthropic doing what the rest of the industry keeps promising and mostly fails to deliver: shipping a model that's substantively better at production work without making you relearn anything or pay more for it. That's worth noticing, even without the launch video.

Want to find the AI tools already running on the newest models? Browse the marketplace by category, or head to /community to see what other UK operators are pinning to their stacks. When a model like Opus 4.7 lands, the serious buyers update their picks within the week — the easiest way to track what's actually working is to watch what they're curating.

AnthropicClaudeClaude Opus 4.7Coding AgentsLLMAI News2026

Share:X LinkedIn

Enjoyed this article?

Subscribe to our Weekly AI Digest for more insights, trending tools, and expert picks delivered to your inbox.

Browse AI Apps More Articles