Jailbreak Anthropic's new AI safety system for a $15,000 reward

In testing, the technique helped Claude block 95% of jailbreak attempts. But the process still needs more 'real-world' red-teaming.

Feb 4, 2025 - 20:42
 0
Jailbreak Anthropic's new AI safety system for a $15,000 reward
In testing, the technique helped Claude block 95% of jailbreak attempts. But the process still needs more 'real-world' red-teaming.