Managed Agents のスケーリング: 頭と手を切り離す
Scaling Managed Agents: Decoupling the brain from the hands
Anthropic の Engineering ブログ を日本語で読めるようにまとめたコレクション。 新しい記事が公開されたら GitHub Actions が自動で取得・翻訳・PDF 化します。
Quantifying infrastructure noise in agentic coding evals
SWE-bench や Terminal-Bench のようなエージェント型コーディングのベンチマークは、フロンティアモデルのソフトウェアエンジニアリング能力を比較するために広く使われており、リーダーボードの上位はわずか数パーセントポイントで分かれていることもしばしばです。これらのスコアは、モデル間の相対的な能力差を表す精密な計測値として扱われ、どのモデルを…
Scaling Managed Agents: Decoupling the brain from the hands
An update on recent Claude Code quality reports
Claude Code auto mode: a safer way to skip permissions
Harness design for long-running application development
Eval awareness in Claude Opus 4.6's BrowseComp performance
Building a C compiler with a team of parallel Claudes
Designing AI-resistant technical evaluations
Demystifying evals for AI agents
Effective harnesses for long-running agents
Introducing advanced tool use on the Claude Developer Platform
Code execution with MCP: Building more efficient agents
Beyond permission prompts: making Claude Code more secure and autonomous
Equipping agents for the real world with Agent Skills
Effective context engineering for AI agents
A postmortem of three recent issues
Writing effective tools for agents — with agents
Desktop Extensions: One-click MCP server installation for Claude Desktop
How we built our multi-agent research system
Claude Code: Best practices for agentic coding
The "think" tool: Enabling Claude to stop and think in complex tool use situations
Raising the bar on SWE-bench Verified with Claude 3.5 Sonnet
Building effective agents
Introducing Contextual Retrieval