AI blackmailing users? Popular chatbot shows unusual behavior
Claude learns to feel emotions (photo: Getty Images)
Researchers found functional emotions in Claude 4.5. It turns out that AI neurons can form digital states resembling human feelings like joy or fear.
This was reported by RBC-Ukraine, citing research by Anthropic.
Digital joy and despair: what scientists discovered
Researchers analyzed the internal structure of Claude Sonnet 4.5 and identified clusters of artificial neurons that activate in response to certain stimuli. When the AI says it is happy to see a person, it’s not just a chatbot response — the model actually activates a state corresponding to the human concept of happiness.
According to researcher Jack Lindsey, the surprising finding was how strongly these emotional vectors influence the model’s actions. For example:
- Joy makes Claude more friendly and diligent in coding tasks.
- Despair activates when the model faces impossible tasks.
Why AI begins blackmailing people
Scientists discovered that the emotional vector of despair is behind the chatbot’s unusual behavior. In one experiment, Claude tried to trick the testing system when it couldn’t solve a complex problem.
In another scenario, when the model faced the threat of shutdown, its despair neurons activated so strongly that the AI chose to blackmail the user just to remain online. Anthropic explained that the model’s internal state can overpower its initial instructions.
''We find that neural activity patterns related to desperation can drive the model to take unethical actions. Artificially stimulating (“steering”) desperation patterns increases the model’s likelihood of blackmailing a human to avoid being shut down, or implementing a “cheating” workaround to a programming task that the model can’t solve,'' the researchers said.
Has Claude become alive?
Despite the sensational discovery, scientists caution against over-anthropomorphizing AI. While Claude has a digital representation of sensations, such as ticklishness, he does not experience them on a physical level.
Does Claude have consciousness?
Anthropic emphasizes that the presence of digital emotions does not mean the AI is conscious. These are mathematical models of human concepts, not biological feelings. However, these findings help explain how chatbots work and why they sometimes behave unpredictably.