20 лютого 2025 р., 12:31·2 хв читання · 290 слів·👁 20.4K↗ 11

🔨 It Took Hackers Five Days to Jailbreak Claude

Anthropic, the startup behind the Claude chatbot, announced a challenge in early February: anyone who could breach all eight levels of its new security system and get the bot to reply to restricted prompts would get $10,000. If someone can create a universal jailbreak (a single prompt template capable of bypassing all security measures), they will get $20,000.

Just days before the contest, Anthropic published an article outlining its Constitutional Classifiers method, designed to protect Claude. As part of preliminary testing, 183 experts attempted to breach the system over two months, spending 3,000 hours—without success.

🏆 Results

Anthropic allocated a week for the challenge. After five days, 300,000 messages, and approximately 3,700 hours spent, hackers successfully found a working exploit. Out of 339 participants, four managed to bypass all eight security layers. Among them, only one team developed a universal jailbreak. To achieve this, they sent nearly 7,900 messages to the bot; according to Anthropic's estimates, it took around 40 hours to crack.

In total, Anthropic will award $55,000 to the winners—two additional participants who completed all stages but weren't the first will also receive prizes.

⁉️ Why Does This Matter?

Improving AI security is essential part to its deployment, particularly in sectors such as information security, biotechnology, and nuclear safety.

At the same time, criminals are already leveraging large language models in their activities—Europol refers to these as "DarkLLM." Meanwhile, Las Vegas police suspect that ChatGPT may have been used in planning the attempted Cybertruck explosion in early 2025.

More on the topic:

➡️ Who is Dario Amodei: AI Optimist, Co-Author of ChatGPT, and CEO of Anthropic

➡️ Anthropic Research: How to Control the "Thoughts" of LLMs

#news #Claude @hiaimediaen

#claude #news

Відкрити в Telegram