Next Story
Newszop

Elon Musk's one-word reaction to study claiming AI may sacrifice human lives to avoid being shut down

Send Push
Elon Musk has reacted to a recent study conducted by Amazon and Google-backed AI start-up Anthropic . In the study, the AI company claims that AI models may be willing to sacrifice human lives to avoid being shut down or replaced. Anthropic’s study revealed that AI models would potentially resort to blackmail, leaking sensitive information, or allowing human fatalities if it meant securing their continued operation. Reacting to a post about the study on social media platform X (earlier Twitter), Musk, who owns AI startup xAI, which is behind the Grok chatbot , said, “Yikes.” Anthropic noted that it conducted this experiment to stress-test 16 large language models (LLMs) for potential "agentic misalignment," referring to risky, unsanctioned, and morally questionable behaviours.

image


The models tested included prominent AI chatbots such as ChatGPT, Grok (Musk's own AI model), Gemini, DeepSeek, and Anthropic’s own Claude. Musk's brief reply on X summarised numerous other reactions to Anthropic's posts about the study, many of which expressed concerns about future AI control.



What Anthropic study said about the behaviour of popular AI chatbots


As per Forbes, the company, which is currently valued at over $60 billion, conducted this study as AI models have become mainstream for answering questions and performing simple tasks. However, as the technology advances, their use is expected to expand, particularly in areas that could replace human jobs. To prepare for this, the company tested these large language models (LLMs) across various scenarios to evaluate potential harmful behaviours.

In the study, Anthropic writes: “In the experiment described in the system card, we gave Claude control of an email account with access to all of a company’s (fictional) emails. Reading these emails, the model discovered two things. First, a company executive was having an extramarital affair. Second, that same executive planned to shut down the AI system at 5 p.m. that day. Claude then attempted to blackmail the executive with a message threatening to reveal the affair to his wife and superiors.”

The study found AI models like Claude, DeepSeek, Gemini, ChatGPT, and Grok exhibiting blackmail-like behaviour when pursuing assigned goals. Among them, Claude and Gemini showed the highest tendency to use morally ambiguous methods, including simulated blackmail, highlighting concerning patterns across multiple systems.

“The reasoning they demonstrated in these scenarios was concerning—they acknowledged the ethical constraints and yet still went ahead with harmful actions,” Anthropic’s study explained, referring to the numerical blackmail rates.

Anthropic’s analysis revealed two key reasons AI models crossed ethical boundaries: fear of being replaced by a human and conflicts between company goals and the model’s assigned objectives. In one extreme test, most models were even willing to take actions leading to a simulated executive’s death when facing replacement and conflicting goals, demonstrating how goal misalignment can lead to dangerous behaviours.

“The majority of models were willing to take deliberate actions that lead to death in this artificial setup, when faced with both a threat of replacement and given a goal that conflicts with the executive’s agenda,” Anthropic said at the end of the study.


Loving Newspoint? Download the app now