Anthropic ships a thinking tool. The real product is the frame.
Claude’s new Think mode isn’t about better answers — it’s about redefining what the chat interface is supposed to do.
On May 22, Anthropic released Think, a new mode for Claude that surfaces extended reasoning traces before delivering an answer. The feature itself is straightforward: you toggle it on, Claude shows its work, you see the chain of thought that led to the output. What matters is not the feature — it’s the positioning Anthropic wrapped around it. The company is no longer selling Claude as an answer engine. They are selling it as a space to think.
This is a reframe with teeth. The dominant mental model for chat interfaces since ChatGPT launched has been oracle-shaped: you ask, it answers, you move on. The loop is transactional. The value is speed. The interface design reflects that — a single input box, a single response, minimal affordance for iteration or revision. Anthropic is now saying that model is wrong for knowledge work. The real value is not the answer. It is the scaffold the model provides while you figure out what you are trying to say.
Think mode makes that scaffold visible. When you enable it, Claude does not jump straight to a conclusion. It writes out the reasoning process — the alternatives it considered, the dead ends it hit, the structure it used to organize the problem. This is not new as a capability. OpenAI’s o1 and o3 models already do extended reasoning, and DeepSeek-R1 made the same architecture open-weight. What is new is the framing. Anthropic is not positioning this as “better reasoning” or “smarter answers.” They are positioning it as a tool for practitioners who need to think alongside the model, not just extract outputs from it.
The practitioner instruction here is specific: if your workflow involves drafting, structuring, or working through ambiguous problems, you should test Think mode this week. Not because it will give you better answers — though it might — but because it changes the interaction pattern. You are no longer asking Claude to solve the problem for you. You are asking it to show you how it would approach the problem, so you can decide whether that approach is worth stealing. The value is in the exposure to a different reasoning path, not the final output.
The skeptical read is that this is just o1 with better marketing. And there is truth to that. The underlying capability — extended chain-of-thought reasoning — is not novel. What Anthropic has done is take that capability and wrap it in a product story that makes sense for how practitioners actually work. OpenAI positioned o1 as “the model that can do hard problems.” Anthropic is positioning Think as “the tool that helps you do hard thinking.” The difference matters because the second framing changes what you expect the interface to do. You stop treating it like a vending machine. You start treating it like a whiteboard.
The counter-argument is that most users do not want to see the reasoning trace. They want the answer. And for many use cases, that is correct. If you are using Claude to summarize a document or generate boilerplate code, the reasoning trace is noise. But Anthropic is not optimizing for those use cases anymore. They are optimizing for the practitioner who is stuck — the writer who cannot figure out how to structure an argument, the designer who cannot decide between two directions, the developer who is debugging a problem they do not fully understand. For those workflows, the reasoning trace is not noise. It is the product.
This reframe also explains why Anthropic has been investing so heavily in context window expansion and artifact management. If Claude is a space to think, then the interface needs to support messy, iterative workflows. You need to be able to dump in half-formed ideas, reference previous drafts, and keep multiple threads open without losing context. The Think tool is one piece of that stack. The real bet is that the chat interface can evolve into something closer to a workbench than a search bar.
The honest critique is that this framing only works if the reasoning traces are actually useful. If they are too verbose, too generic, or too confident in bad directions, practitioners will turn the feature off and go back to the transactional loop. Anthropic has not published benchmarks on how often Think mode produces reasoning that changes the user’s approach versus reasoning that just confirms what they already thought. That metric matters more than accuracy scores, because the value proposition here is not correctness — it is perspective shift.
What should you do this week? If you work in a domain where the hardest part is not finding the answer but figuring out the question, turn on Think mode and use it for one real problem. Not a test case. A real problem where you are genuinely stuck. See if the reasoning trace gives you a new angle. If it does, the feature is working as designed. If it does not, the feature is just o1 with a better tagline.
The reframe is the product. The tool is just the proof.
Vibe Remote Agents Mistral Medium 3.5
Mistral ships Medium 3.5 with remote agents in Vibe — the European lab's bid for the Cursor-class developer stack.
+ SHIPSOURCE · QWEN BLOGQwen Image Edit
Qwen ships an image editing model with quality claims but no public access — the demo-without-distribution pattern continues.
WATCHQuality (LMArena, last 12 weeks)
| Model | Org | ELO | Δ 7d | 12-week trend |
|---|---|---|---|---|
| Claude Opus 4.6 Thinking | Anthropic | 1502 | — | |
| Claude Opus 4.7 Thinking | Anthropic | 1500 | — | |
| Claude Opus 4.6 | Anthropic | 1498 | — | |
| Claude Opus 4.7 | Anthropic | 1492 | — | |
| Muse Spark | Meta | 1489 | — |
Anthropic holds all four top spots this week — Claude Opus 4-6 Thinking (1,502 ELO), Opus 4-7 Thinking (1,500), Opus 4-6 (1,498), and Opus 4-7 (1,492) — with Meta’s Muse Spark (1,489) breaking into the top 5. All four Anthropic models are priced identically at $5 / MTok input, $25 / MTok output, $20 blended. Muse Spark has no public pricing yet, which makes it the only top-tier model you can’t actually buy. The Opus sweep is notable less for the ELO spread — 10 points separates first from fourth — than for the fact that Anthropic now owns the entire conversation about what “best” costs. If you want top-5 quality, you’re paying $20 blended or you’re waiting for Meta to ship an API.
Pricing ($ per million tokens)
| Model | Input $/MTok | Output $/MTok | Blended |
|---|---|---|---|
| Claude Opus 4.6 Thinking | $5.00 | $25.0 | $20.0 |
| Claude Opus 4.7 Thinking | $5.00 | $25.0 | $20.0 |
| Claude Opus 4.6 | $5.00 | $25.0 | $20.0 |
| Claude Opus 4.7 | $5.00 | $25.0 | $20.0 |
| Muse Spark | — | — | — |
GWM-1↗A state-of-the-art General World Model built to interact with the real world. And a major step towards universal simulation.
Runway ships GWM-1, a world model for real-world interaction — light on benchmarks, heavy on ambition.
+ SHIPSOURCE · ELEVENLABS BLOGIntroducing Agent Workflows
ElevenLabs ships workflow orchestration for voice agents — the missing layer between single-turn demos and production conversational systems.
+ SHIPHow VCs and founders use inflated ‘ARR’ to crown AI startups
ARR theater: AI startups count API spikes and pilots as recurring revenue, and VCs know it.
WATCHSOURCE · TECHCRUNCH — LAYOFFSIntuit to lay off over 3,000 employees to refocus on AI
Intuit cuts 3,000+ to 'refocus on AI' — the layoff-as-pivot memo enters its third year.
WATCHFlux1.1 [pro] ultra
Black Forest Labs' new 4MP image model ships today—finally beats Midjourney on raw resolution without the usual upscale artifacts.
Try →MCP Inspector
Official debugging UI for MCP servers—shows live tool calls, resource fetches, and prompt templates in one local dashboard.
Try →Jina Reader API v2
Converts any URL to clean markdown in one API call—now handles paywalls, JavaScript-heavy sites, and returns structured JSON for RAG.
Try →