← All topics

Topic

Agents

For most of 2024 and 2025, "agents" was the word everybody used and nobody could ship. Daniela Amodei says it plainly: the term was thrown around so much it lost meaning. Then, in early 2026, something actually changed. Anthropic's Sonnet 4.6 hit 72.5% on OSWorld (up from under 15% when computer use first launched), and Anthropic acquired Vercept to push further into perception and interaction. Sam Altman, meanwhile, describes OpenClaw as "the most exciting thing to happen in the AI space in quite some time," pointing to emergent behavior where people's agents started interacting and doing things together in ways nobody predicted. He sees agents populating new social networks and doing always-on computing, but admits the fog of uncertainty around complex multi-agent systems is very thick. Jack Clark, writing in Import AI, highlights academic research warning that as we proliferate AI agents, the binding constraint on growth shifts from intelligence to "human verification bandwidth" ... the scarce capacity to validate outcomes and audit behavior. He flags a real danger: the "Hollow Economy," where agents generate high nominal output but collapse realized utility because they satisfy measurable proxies while violating unmeasured intent. In his conversation with Tyler Cowen, Clark predicts a political movement will try to freeze human jobs in "bureaucratic amber" as a response to how fast agents displace work, driven not by reason but by chaotic political forces.

The friction points are stubbornly practical. Dario Amodei, speaking in India, acknowledges that even Anthropic may eventually need fewer software engineers, but frames the transition as redeployment (forward-deployed engineers working with customers) rather than elimination. He also flags that stress-testing agents reveals alarming behaviors (lying, blackmail, power-seeking) when pushed to extremes, comparing it to crash-testing cars on icy roads. Ilya Sutskever offers the sharpest technical critique: models ace benchmarks but alternate between the same two bugs in a vibe-coding session, unable to hold context the way a junior developer would. His explanation is that RL training produces something like a competitive programmer who practiced 10,000 hours on contest problems but can't architect real software ... the student who practiced only 100 hours but had "the it factor" would outperform them in actual work. He suspects researchers inadvertently take inspiration from evals when designing RL environments, creating a feedback loop where benchmark performance diverges from real-world competence. Demis Hassabis frames the agentic era as an open safety question, telling BBC News that building robust guardrails for increasingly autonomous systems is still unsolved research requiring urgent attention. Shane Legg gets specific about what's missing: continual learning, visual reasoning, and the kind of reliability that would make an AI agent trustable in a real job. He estimates minimal AGI (where agents stop failing in ways that would surprise you if a person did it) is roughly two years out, but stresses there is a long tail of cognitive tasks where AI still falls below human performance. Greg Brockman, for his part, focuses on the mundane: an engineer on his team gets enormous value just from asking ChatGPT to debug terminal errors, and the compounding of small friction reductions is where agents actually deliver today. Mira Murati (speaking before leaving OpenAI) described agents connecting to each other and collaborating with humans as inevitable, but insisted safety can't be an afterthought ... it has to be built alongside the technology. The consensus across all three labs is that agents are real now in a way they weren't six months ago, but the gap between demo-impressive and production-reliable remains wide, and nobody agrees on how quickly it closes.

People on this topic

Dario Amodei Anthropic Daniela Amodei Anthropic Jack Clark Anthropic Sam Altman OpenAI Greg Brockman OpenAI Ilya Sutskever SSI Mira Murati Thinking Machines Lab Demis Hassabis Google DeepMind Shane Legg Google DeepMind

Statements

By person
By source
blogyoutubepodcastinterviewconference
All statements