Hands-on test of Claude Opus 4.8 — this may be the first model that won’t slack off

Big money and steady polishing

It has been reported that Anthropic has closed an H round of financing worth $65 billion, giving the company a post-money valuation of $965 billion. The company says the funds will expand R&D and compute capacity to meet growing demand for Claude. Reportedly, Anthropic is also quietly developing lower-cost Opus-like models and a stronger family of models codenamed Mythos.

The launch is not billed as a leap so much as a hard polish. Opus 4.8, now rolling out in Claude Code 2.1.154 (and already live in app and desktop clients, with API access arriving later), focuses on judgment and “honesty” — meaning the model admits uncertainty instead of bluffing, flags its own code defects, and can sustain longer independent work. Anthropic says internal tests show the model is far less likely to pretend problematic code is OK; it “found and didn’t hide” its own defects about three quarters more often than Opus 4.7.

Developer features: long runs, more agents, and cheaper fast inference

The headline technical change is Claude Code’s dynamic workflow: Claude now plans a task and spins up hundreds of parallel subagents to work, verify, and recheck before reporting back. The company cites a Bun migration trial that touched roughly 750,000 lines of Rust, moved from first commit to merge in 11 days, and maintained 99.8% of the preexisting test suite — an example of organizing many agents rather than supercharging a single one. Effort-control (how much token/“thought” a reply consumes) is extended to claude.ai and Claude Cowork, while Claude Code defaults to a higher-effort mode for code tasks. A small but useful developer change: the Messages API now allows in-stream system messages, so long-running jobs can update system instructions without breaking context.

Pricing tweaks are notable. Base rates remain $5 per 1M input tokens and $25 per 1M output tokens; Fast mode is priced at $10/$50 but runs about 2.5× faster, and Anthropic says the new Fast tier is roughly one-third the price of the previous Fast offering.

Why it matters — trust, enterprise adoption, and the road to IPO

This release reiterates Anthropic’s narrative: capability must come with calibrated uncertainty and user-aligned behavior. Why does that matter? Enterprises adopt models when they are not just capable but predictable and honest about limits. That trust narrative also helps with partnerships and, reportedly, IPO positioning. Geopolitically, Anthropic sits in a crowded U.S. AI landscape under closer scrutiny from regulators and rivals; positioning Claude as safer and more reliable is as much a market play as a technical one. Can better honesty and long-running coordination turn into a durable commercial moat? That’s the bet behind Opus 4.8.