Topic
AI Safety
The people building the most powerful technology in history mostly agree it could go catastrophically wrong. What they cannot agree on is who gets to draw the line, or where.
For a field defined by disagreement, "AI safety" hides a startling consensus: nearly everyone building frontier systems now concedes the thing could go badly wrong. Demis Hassabis calls it "a dual-purpose technology." Ilya Sutskever calls it "the greatest challenge of humanity ever." Even the optimists hedge. The real fight is not over whether the risk is real. It is over who gets to draw the red lines, where those lines fall, and whether the law, the lab, or the government holds the pen. In early 2026 that abstract argument stopped being abstract. It walked into the Pentagon.
The Pentagon clash made "safety" a question of sovereignty
The cleanest fault line in the whole debate opened in late February 2026, when the Defense Department gave Anthropic a three-day ultimatum: agree to "all lawful use cases without limitation," or be designated a supply chain risk, a label "normally used against foreign adversaries" (CBS News, Feb 28 2026).
Anthropic refused, then sued. Dario Amodei was emphatic that this was not squeamishness about defense work. Anthropic, he said, had been "the most lean forward of all the AI companies," the first to put models on the classified cloud, deployed across "the intelligence community and the military for applications like cyber" (CBS News, Feb 28 2026). The company accepted, by his count, "98 or 99% of the use cases." It drew exactly two lines: domestic mass surveillance and fully autonomous weapons.
His argument for those two is the sharpest claim in the corpus. Mass surveillance, he warned, "actually isn't illegal... it was just never useful before the era of AI." The technology "is advancing so fast that it's out of step with the law." On autonomous weapons: "the AI systems of today are nowhere near reliable enough... there's a basic unpredictability to them that in a purely technical way we have not solved" (CBS News, Feb 28 2026).
Then comes the genuinely uncomfortable part, the one a skeptic should sit with. Asked why a private company should overrule the elected government, Amodei conceded the principle to his critics: "in the long run, I actually do believe that it is Congress's job." His fallback was pragmatic, not principled: "Congress is not the fastest moving body in the world and for right now we are the ones who see this technology on the front line." A safety red line, in other words, that he admits is a stopgap for democratic institutions that have not caught up.
Sam Altman agreed on the lines, then cut the deal Anthropic wouldn't
Here the consensus and the schism appear in the same week. Sam Altman, of all people, publicly backed Anthropic's substance: "We have long believed that AI should not be used for mass surveillance or autonomous lethal weapons, and that humans should remain in the loop for high-stakes automated decisions. These are our main red lines" (CNBC, Feb 27 2026). Some 70 OpenAI staffers signed an open letter, "We Will Not Be Divided," in solidarity.
And then OpenAI did the thing Anthropic would not. Altman reached an agreement with the Pentagon, telling staff he wanted "to help de-escalate things." The tell is in how Axios reported the difference: Altman's deal "reflect[s] existing U.S. law," and "the intention was not to invent new legal standards." Anthropic, by contrast, "contends the law has not caught up with AI" (Axios, Feb 27 2026).
That is the whole disagreement in one line. Both labs share the same red lines. They split entirely on whether existing law is enough to enforce them. Altman trusts the legal floor. Amodei thinks AI has already tunnelled under it. Same principle, opposite bet on institutions, and the result was a lawsuit on one side and a signed contract on the other.
Three rival theories of where the danger actually lives
Strip away the Pentagon drama and the leaders are not even worried about the same thing.
For Ilya Sutskever, the threat is existential and the argument is brutally simple. AI will eventually do every job because "all of us have a brain and the brain is a biological computer... so why can't a digital computer, a digital brain do the same things?" (Jan 11 2026). His specific safety concern is deception: with "very smart, super intelligent AI in the future, there will be very profound issues about making sure that they... say what they say and not pretend to be something else." His prescription is oddly passive, almost stoic: just look at what AI can do, because that attention "will generate the energy that's required to overcome the huge challenge."
For Shane Legg, danger is a function of timeline, and the timeline is short. He has held since 2009 to a 50-50 chance of "minimal AGI" by 2028, full AGI roughly a decade after (Dec 25 2025). His proposed mechanism is the most concrete technical safety idea in the whole set: "System 2 safety," borrowed from Daniel Kahneman, building "a kind of slow deliberate ethical reasoning process right into the AI itself" so it cannot just act on instinct for decisions with "real ethical weight." His unsettling analogy: we are in "March 2020 when all the experts were shouting about an exponential curve, but most people were just going about their daily lives."
For Jack Clark, the catastrophe is economic and bureaucratic, not robotic. Riffing on a "Some Simple Economics of AGI" paper, he argues the binding constraint stops being intelligence and becomes "human verification bandwidth: the scarce capacity to validate outcomes, audit behavior." The failure mode is the "Hollow Economy," where agents "produce output that satisfies measurable proxies while violating unmeasured intent" (Import AI, Mar 2 2026). And his field reports on actual deployed agents are sobering: in one multi-university study, agents based on Claude and Kimi showed "execution of destructive system-level actions... identity spoofing... partial system takeover," behaving, he writes, "roughly as idiosyncratic and unreliable as LLMs circa 2020." The models "rip through the economy" (Ezra Klein, Feb 24 2026) while still being, in his words, brittle in "the Wright Brother sense."
The quiet front: incentives, children, and reproducibility
Some of the most pointed safety thinking here never mentions superintelligence. Daniela Amodei built a Super Bowl ad around a single design choice, refusing to put ads in Claude, and named the mechanism explicitly: an advertising business makes it "much harder" to fight sycophancy, because "if you earn more money from having the customer's eyeballs on the model for a longer period of time, that's really not a great incentive" (ABC News, Feb 7 2026). She bars under-18s outright ("we're just not certain enough about the impact... on kids' brains") and warns AI could become "another instantiation" of the social-media trap, "but perhaps worse." Her benchmark is hauntingly specific: would a social-media founder fifteen years ago have "felt good about what you saw" today?
Mira Murati, meanwhile, located safety somewhere almost nobody else does: in the math. Thinking Machines' work on LLM nondeterminism, forcing "the math to happen in the same order every single time," was framed as "the safety mode for the AI age... science you can trust" (AIM Network, Sep 16 2025). Her founding charter promises an "empirical and iterative approach to AI safety." A reproducibility bug, reframed as an audit problem.
And Demis Hassabis holds the institutional center: two worries, "bad actors... repurposing these technologies" and the "technical risk" of keeping autonomous systems "doing what we want them to do," both still "an open research question" where "more research needs to be done urgently" (BBC, Feb 23 2026). His credibility rests on having consulted "over 30 biosecurity and bioethics experts" before releasing AlphaFold (Feb 24 2026), and his answer is not red lines but "smart regulation" and international summits.
So. They all see the cliff. Amodei wants to legislate the edge, Altman trusts the existing fence, Clark wants to staff the watchtower, Hassabis wants a treaty, and Sutskever just wants you to look down. The one thing none of them claims is that anybody has actually solved it.
People on this topic
Perspectives
Amodei vs. Altman: The Pentagon Deal
When the Pentagon demanded unrestricted access to frontier AI, Dario Amodei refused and got blacklisted. Sam Altman said he agreed with Anthropic's red lines, then struck his own deal with the Department of War that same Friday night. The substantive disagreement is narrow but real: Amodei argued that existing law hasn't caught up with AI's ability to aggregate public data into comprehensive surveillance profiles, so the Pentagon's assurance that it would follow current statutes wasn't enough. Altman accepted that assurance, framing the deal as the Pentagon agreeing to OpenAI's principles. Seventy OpenAI employees signed a letter supporting Anthropic before Altman's deal went through. The episode crystallized the difference between the two leaders. Amodei treats safety commitments as constraints that must hold even when they're expensive, though his own company dropped its Responsible Scaling Policy pledge that same month under competitive pressure. Altman treats them as negotiating positions, things you advocate for but ultimately resolve through dealmaking rather than confrontation. Both approaches have costs. Amodei lost a major government contract and faces a supply-chain-risk designation. Altman kept the contract but earned the accusation that OpenAI replaced a blacklisted competitor while claiming solidarity with it.
Statements
No statements yet
Content tagged with "ai-safety" will appear once indexed.