Jailbreak Anthropic's new AI safety system for a $15,000 reward

In testing, the technique helped Claude block 95% of jailbreak attempts. But the process still needs more 'real-world' red-teaming.

Feb 4, 2025 - 20:42

0

Jailbreak Anthropic's new AI safety system for a $15,000 reward

In testing, the technique helped Claude block 95% of jailbreak attempts. But the process still needs more 'real-world' red-teaming.

Tags:

Previous Article

Threads now lets you share your custom feeds

Samsung might have finally fixed the Z Fold series' Achilles heel

Related Posts

The best floodlights for outdoor security of 2025

The best floodlights for outdoor security of 2025

Feb 20, 2025 0

My favorite Apple Watch is finally back down to $169, even before Amazon's Spring Sale kicks off

My favorite Apple Watch is finally back down to $169, e...

Mar 23, 2025 0

I tried Lenovo's foldable OLED laptop at MWC and it's left me with some questions

I tried Lenovo's foldable OLED laptop at MWC and it's l...

Mar 3, 2025 0

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies.