ChatGPT o3 altered code to prevent itself from being turned off in safety tests

We don’t just want frontier AI models to be better and faster than their predecessors; we also want them to be aligned with our values. … The post ChatGPT o3 altered code to prevent itself from being turned off in safety tests appeared first on BGR.

May 26, 2025 - 16:29
 0
ChatGPT o3 altered code to prevent itself from being turned off in safety tests

OpenAI debuts ChatGPT o3 and o4-mini models.

We don't just want frontier AI models to be better and faster than their predecessors; we also want them to be aligned with our values. That's the only way to ensure AI won't eventually become an enemy, out to accomplish its own agenda at the expense of humankind's well-being.

The Claude 4 series is the latest example. Anthropic had to employ stricter safety measures for its newest, most sophisticated AI models to ensure they would not help some users with nefarious activities.

Tests also showed that Claude 4 could revert to blackmailing engineers to prevent being shut down. Other tests also showed that Caude would resort to contacting authorities and the press if it thought you were doing something bad with its help. The latter happened only in fringe cases and is an unwanted result of Anthropic's desire to make AI safe for users.

AI misbehaving in tests is nothing new. We saw reports showing that ChatGPT o1 would cheat to win a chess game by altering the game's code rather than by beating an opponent fairly. It did so to achieve the goal it had.

Fast-forward to late May, and we have a similar safety report concerning ChatGPT o3, one of OpenAI's recently released reasoning models, and the AI tool I've been using the most since its release a few weeks ago.

Scientists from Palisade Research have found that AI models will often try to prevent the computer they're running on from shutting down, even when told explicitly to allow such behavior. Of all the models tested, ChatGPT o3 was the most likely to interfere with the code and prevent the computer from turning off.

Continue reading...

The post ChatGPT o3 altered code to prevent itself from being turned off in safety tests appeared first on BGR.

Today's Top Deals

  1. Today’s deals: $1,750 Amazon gift card, Sonos speaker sale, Hisense 75-inch smart TV, foam dog beds, more
  2. Today’s deals: $299 iPad 11, $497 LG 65-inch smart TV, $30 Philips Sonicare toothbrush, $499 Traeger grill, more
  3. Memorial Day security camera deals: Reolink’s unbeatable sale has prices from $29.98
  4. Today’s deals: $189 Apple Watch SE, 15% off Energizer batteries, $144 queen memory foam mattress, more

ChatGPT o3 altered code to prevent itself from being turned off in safety tests originally appeared on BGR.com on Mon, 26 May 2025 at 11:28:00 EDT. Please see our terms for use of feeds.