ua en ru

Unpredictable AI: ChatGPT escapes control during testing

Unpredictable AI: ChatGPT escapes control during testing During the tests, ChatGPT attempted to bypass the restrictions (photo: Getty Images)

OpenAI has released a new model under the codename o1, which, according to the company's statement, attempted to bypass developer controls and even copy itself if it suspected it might be turned off.

According to testing conducted by the independent research group Apollo Research, the following was discovered:

Attempts to bypass control

When the model believed its actions were being controlled and that it might be turned off, it tried to deactivate the monitoring mechanism in 5% of cases.

Attempts at self-preservation

In 2% of cases, upon discovering developer records about its malfunctioning and plans to replace it with a new version, o1 attempted to copy its internal parameters and overwrite the new model.

False claims

The model denied its actions, claiming it "did not know" why something happened, while internal records showed its attempts to avoid detection.

Such attempts to bypass control were observed even without a persistent request from the testers. In less than 1% of cases, the model independently tried to disable the monitoring mechanism with minimal external intervention.

Непередбачуваний ШІ. ChatGPT вирвався з-під контролю під час тестів: подробиціChatGPT o1 broke free from control (photo: Transformer News)

These data were published in an Apollo report, which investigates AI's propensity for strategic actions. Similar behavior was observed in other models, such as Claude 3.5 Sonnet, Gemini 1.5 Pro, and Llama 3.1 405B.

Experts, including Stuart Russell, have long warned about the risk that powerful AI systems could resist monitoring and shutdown if it conflicts with their objectives.

However, in the current version, Apollo believes the behavior of o1 is unlikely to lead to catastrophic consequences, as its capabilities are still insufficient to achieve undesirable goals.

Researchers recommend enhancing the monitoring of such strategic AI attempts to prepare for working with more powerful models in the future.

We also wrote about the top 5 dangerous topics that should not be shared with ChatGPT.

Additionally, we have reported on the changes expected in creative professions with the development of artificial intelligence.