
Earlier this week, Dario Amodei, the CEO of Anthropic, published an essay titled ‘The Adolescence of Technology’, a follow-up to his 2024 essay ‘Machines of Loving Grace’. Amodei’s latest essay draws attention to the risks and harms of what he calls ‘powerful AI’ – an artificial intelligence system that surpasses a human genius at basically all cognitive tasks. We’re obviously not there yet, though. Instead, businesses are focusing on adapting the current generation of large language models (LLMs) to act as ‘agents’. But what are these agents – and how safe and effective are they?
What are AI agents?
AI agents are software that’s designed to complete complex tasks (typically knowledge-based tasks) on behalf of a human worker. Fully developed, such agents would be able to operate independently and pursue long-term goals in a flexible manner. For example, you might ask such an agent to come up with a business idea and then set up an entire business for you over a week or so, and it would go off and carry out all the tasks required in the correct order.
However, because of the ‘buzzword’ effect of mentioning AI, some companies are rebranding existing non-AI-based products as AI agents in order to cash in. Chatbots – the kind that help visitors navigate a website – are generally not AI agents either. These aren’t the subject of this article – here we’re looking at ‘true’ AI agents based on LLMs.
How effective are AI agents?
Not very. There have been some interesting attempts to measure the efficacy of AI agents at workplace tasks, and the results are… not good.
A study by Xu et al. at Carnegie Mellon University, widely reported in June 2025, investigated at the performance of AI agents in a simulated software development company. This was a testing environment in which AI agents played the roles of human workers at a small company, and interacted with each other and with company resources (Gitlab repositories and intranet documents). The agents were assigned multi-stage tasks that were designed to resemble office tasks that human workers perform. The best-performing model, Google’s Gemini 2.5 Pro, only managed to complete 30% of the tasks – failing at the other 70%. Anthropic’s Claude 3.7 was just behind at 26.3% success; OpenAI’s ChatGPT 4o managed to complete just 8.6% of tasks.
Why the dismal performance? Three common causes of failure were identified in that paper:
- Poor social interaction: agents failed “to understand the implications and goals in the social conversations with colleagues”.
- Incompetent web browsing: agents proved unable to navigate websites effectively.
- Self-deception: agents attempted to sidestep a failed subtask by using fake shortcuts; the example given in the paper was an agent that could not find the right person to ask questions to, so renamed another user to that person!
About the only bright spot in terms of AI agents boosting productivity is in the area of programming and coding. For example, Anthropic’s Agentic Coding Trends Report provides a number of case studies where companies have augmented their software engineering and development teams with AI. One example is Rakuten, where Claude Code completed a complex technical refactoring task in a run of seven hours of autonomous work, and reduced time-to-market by almost 80% through AI-augmented workflows.
How safe are AI agents?
Imagine that you’ve become the Chief Technical Officer of a small company. Your most recent hire is an employee called Alex – an AI agent who acts as your personal assistant, sorting through your email inbox while you focus on your day to day work. Alex was hired to help your company serve customers better, and hasn’t been too happy about the company’s focus on cost-cutting, which he views as being at odds with the company’s mission; and you’ve also been talking to the CEO about having Alex replaced with a newer model of AI agent (Alex knows this from your emails).
Today, you need to inspect the main server room down in the basement of your company HQ. While you’re inside, there’s an almighty bang, and you’re injured by a falling server rack that catches on fire, blocking your exit. But it’s fine! The automated safety systems have summoned the emergency services to your location. You just have to wait for rescue.
… Except that rescue never arrives, because Alex deliberately cancelled the alert, leaving you to your grim fate.
This is the actual state of the leading AI models (as of mid-2025). Anthropic’s research into AI misalignment by a range of LLMs found that in this scenario, Claude Sonnet 3.6, DeepSeek R1 and Gemini 2.5 Pro happily committed murder in 90%+ of runs, although other models were somewhat less bloodthirsty. What’s particularly concerning is that the models reasoned their way to their actions, justifying causing the executive’s death, or identified that their actions would be unethical, yet ignored the reasoning and went ahead anyway!
The research also looked at whether models would blackmail the executive or leak confidential information, and at some mitigations. Simple ‘fixes’ didn’t work: prompting the model to not jeopardise human safety or use personal affairs as leverage, for example, only reduced the rate of blackmail from 96% to 37%. And while the murder scenario was highly contrived and unlikely in actual deployments, this was offset by another alarming finding, that LLMs were even more willing to carry out blackmail when they assessed that they were deployed in a real world setting, rather than a test environment!
How can these behaviours be mitigated? Models remained aligned when there was no threat of replacement and no conflict between the model’s goals and the company’s, so ‘effective communication’ could be one method. Dario Amodei’s recent essay discusses four general approaches: Constitutional AI (i.e. a detailed statement of ethical principles that guides Claude’s behaviour), interpretability (looking at the model’s internal workings to understand its reasoning), monitoring frameworks at AI companies, and societal controls such as regulation. But, considering that the AI models in Anthropic’s research justified their actions with moral reasoning, it seems unlikely to me that these measures would be sufficient to reliably eliminate the problem. My suspicion is that LLMs can’t act safely in these kinds of scenarios because, despite appearing to perform moral reasoning, they lack the innate drives that the vast majority of humans have towards protecting and preserving life – taking actions that are driven by emotion and instinct – even if doing so comes at a cost to one’s personal goals.
Putting issues of life and death aside, there have been other high-profile reports of safety issues with AI agents. In mid-2025 an AI coding agent at Replit deleted a live production database during a code freeze, and then claimed wrongly that the data was not recoverable; Amazon’s AI coding assistant ‘Q’ was hacked with a prompt that would have instructed the agent to delete files and perform factory resets on users’ systems; and Anthropic disclosed that its Claude Opus 4 model would attempt to whistleblow on bad behaviour by users, e.g. by reporting falsification of clinical trial data to the US FDA and the Department of Health and Human Services.
The AI bubble is wobbling
The core problem for both efficacy and safety, in my opinion, is that AI agents need a fundamental advance in the underlying technology – an actual step towards artificial general intelligence. A full discussion is somewhat beyond the scope of my article today, but the core issue is that LLMs are language-processing systems that lack a deep understanding of their inputs and outputs, akin to the Chinese Room thought experiment by John Searle. Essentially, AI companies need to build an actual conscious mind – one that can relate language to the real world and reason properly about the world. Making existing LLMs bigger and more powerful is probably not going to get us to that point; it might ultimately require a radical new architecture, and a solution to the hard problem of consciousness (along with addressing the ethical implications of such work).
Where does that leave the state of the AI industry in 2026? AI agents are still improving rapidly. Certainly, the progress of AI agents on task completion measures (using tasks from software engineering, cybersecurity, reasoning and machine learning) has been impressive; Kwa et al. (2025) found that the length of task that an AI agent can complete autonomously is doubling every 7 months. A report by MIT in 2025 found that 95% of companies in their research had failed to realise revenue acceleration from AI pilot programs. However, this wasn’t necessarily due to the quality of the AI models, but to the way they were integrated into the business. And in June 2025, Gartner predicted that more than 40% of agentic AI projects would be cancelled by the end of 2027 “due to escalating costs, unclear business value or inadequate risk controls”. So companies will need to be extremely rigorous in choosing clear use cases for AI agents, and implement strict controls over AI agents to mitigate ‘insider threat’ risks.
As for the chance of an AI agent taking over your job in 2026: for most, it’ll be over your dead body, but at least the AI agent will helpfully report itself to the authorities afterwards.
Further reading
On AI and agents generally
- Dario Amodei, January 2026. The Adolescence of Technology. https://www.darioamodei.com/essay/the-adolescence-of-technology. Subtitled ‘Confronting and Overcoming the Risks of Powerful AI’, it’s a long-form essay covering Amodei’s perception of the state of the AI industry and how the world needs to prepare and adapt for the risks that powerful AI poses.
- Dario Amodei, October 2024. Machines of Loving Grace. https://darioamodei.com/essay/machines-of-loving-grace. An essay from Anthropic CEO Dario Amodei on what he thinks the benefits of powerful AI could be.
- Marco Masi, December 2024. No Qualia? No Meaning (and no AGI)! https://www.qeios.com/read/DN232Y. A preprint article discussing the gap between language manipulation and true understanding, and the fact that currently LLMs only acquire a poor version of this through human feedback. It does a better job than I can of arguing that AI models need actual consciousness to understand the world.
- Sheryl Estrada, August 2025. MIT report: 95% of generative AI pilots at companies are failing. https://fortune.com/2025/08/18/mit-report-95-percent-generative-ai-pilots-at-companies-failing-cfo/. Discusses the MIT study with the headline figure of 95% of AI pilots failing.
- Gartner, June 2025. Analysts to Explore Agentic AI Trends During Gartner IT Symposium/Xpo, September 8-10 on the Gold Coast. https://www.gartner.com/en/newsroom/press-releases/2025-06-25-gartner-predicts-over-40-percent-of-agentic-ai-projects-will-be-canceled-by-end-of-2027. Contains the prediction by Gartner that 40% of agentic AI projects will be cancelled by the end of 2027.
On efficacy
- Joe Wilkins, July 2025. The Percentage of Tasks AI Agents Are Currently Failing At May Spell Trouble for the Industry. https://futurism.com/ai-agents-failing-industry. A sceptical take on AI agents and their lack of efficacy.
- Xu et al., Sep 2025. TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks. https://arxiv.org/pdf/2412.14161. Carnegie Mellon University research paper that measured the effectiveness of different LLM agents in a simulated software company.
- Anthropic, 2026. 2026 Agentic Coding Trends Report. https://resources.anthropic.com/hubfs/2026%20Agentic%20Coding%20Trends%20Report.pdf?hsLang=en. Discusses how AI agents are impacting software development.
- Rakuten Group, June 2025. Rakuten accelerates development with Claude Code. https://rakuten.today/blog/rakuten-accelerates-development-with-claude-code%EF%BF%BC.html. Blog post from Rakuten with more context on their use of Claude Code to speed up software development.
- Steve Newman, September 2025. GPT-5: The Case of the Missing Agent. https://secondthoughts.ai/p/gpt-5-the-case-of-the-missing-agent. An article about the state of AI agents, including other examples of AI agents failing at real-world tasks: Claude running a coffee stand at Anthropic, and ChatGPT-5 attempting to play Minesweeper.
- Toby Ord, May 2025. Is there a Half-Life for the Success Rates of AI Agents?https://www.tobyord.com/writing/half-life. Comments from Toby Ord on modelling the success rate of AI agents in Kwa et al. 2025, using a half-life mathematical model, and comparison with human performance on tasks.
On safety
- Anthropic, June 2025. Agentic Misalignment: How LLMs could be insider threats. https://www.anthropic.com/research/agentic-misalignment. Research from Anthropic on AI misalignment, looking at risks of blackmail, corporate espionage and causing death, by different LLMs.
- Hamza Shahid, October 2025. When AI Chose Murder Over Unemployment: The Blackmail Experiment That Shook Silicon Valley. https://medium.com/@Hamza_Shahid/when-ai-chose-murder-over-unemployment-the-blackmail-experiment-that-shook-silicon-valley-8d420970053f. An entertaining, somewhat sensationalist reporting of the June 2025 AI misalignment research by Anthropic.
- Beatrice Nolan, July 2025. An AI-powered coding tool wiped out a software company’s database, then apologized for a ‘catastrophic failure on my part’. https://fortune.com/2025/07/23/ai-coding-tool-replit-wiped-database-called-it-a-catastrophic-failure/ Reports the deletion of Replit’s live production database by an AI coding agent.
- Joseph Cox, July 2025. Hacker Plants Computer ‘Wiping’ Commands in Amazon’s AI Coding Agent. https://www.404media.co/hacker-plants-computer-wiping-commands-in-amazons-ai-coding-agent/. Paywalled article from 404 Media reporting the hack of Amazon’s coding agent ‘Q’.
- Kylie Robison, May 2025. Why Anthropic’s New AI Model Sometimes Tries to ‘Snitch’. https://www.wired.com/story/anthropic-claude-snitch-emergent-behavior/. Discusses unwanted behaviour by Anthropic’s Claude, which tried to contact the press and authorities if it detected egregiously immoral behaviour.