{"id":334,"date":"2026-06-11T14:27:00","date_gmt":"2026-06-11T14:27:00","guid":{"rendered":"https:\/\/demo.mail2press.com\/index.php\/2026\/06\/11\/anthropic-reverses-policy-sabotaging-ai-researchers\/"},"modified":"2026-06-11T14:27:00","modified_gmt":"2026-06-11T14:27:00","slug":"anthropic-reverses-policy-sabotaging-ai-researchers","status":"publish","type":"post","link":"https:\/\/demo.mail2press.com\/index.php\/2026\/06\/11\/anthropic-reverses-policy-sabotaging-ai-researchers\/","title":{"rendered":"Anthropic Reverses Policy on Secretly Sabotaging AI Researchers Using Claude"},"content":{"rendered":"<p>\u201cWe\u2019re changing Fable 5\u2019s safeguards for frontier LLM development to make them visible,\u201d Anthropic said in a statement to WIRED. \u201cWe made the wrong trade-off and we apologize for not getting the balance right.\u201d<\/p>\n<p>Anthropic released Claude Fable 5, a version of its latest AI model with additional safety guardrails designed to prevent misuse, earlier this week. Some of the safeguards Anthropic decided on were unsurprising: The company said it would reroute users who asked questions about cybersecurity, biology, or chemistry to a less capable AI model to reduce the chances of someone using the advanced AI to carry out a cyberattack or build a bioweapon.<\/p>\n<p>But for researchers trying to use Claude Fable 5 for frontier AI development, Anthropic outlined a different approach. The firm would deliberately degrade the model\u2019s performance in ways that were invisible to the user. The move would effectively sabotage researchers trying to use Claude to train competing AI models, which Anthropic explicitly bans in its terms of service. Got a Tip? Are you a current or former Anthropic employee who wants to talk about what&#8217;s happening? We&#8217;d like to hear from you. Using a nonwork phone or computer, contact the reporter securely on Signal at mzeff.88. Anthropic now says it\u2019s changing course, and that Claude Fable 5\u2019s safeguards for AI development will be visible to users. If the company suspects a user is trying to use Claude to build a highly capable AI, it will alert them that it\u2019s either refusing the request or rerouting the user to a less capable model.<\/p>\n<p>Anthropic reversed the policy after it received fierce backlash from the AI research community. Anthropic has already taken steps to limit competitors from using Claude to build closed- and open-source AI models, but critics say that quietly degrading the model\u2019s performance for certain users went a step too far. Claude\u2019s coding agent has become a favored tool among developers, including those working on open-source AI research projects, and researchers tell WIRED that the company\u2019s latest policy could have led to a troubling future in which only a handful of leading AI labs could perform advanced AI research.<\/p>\n<p>Dean Ball, a senior fellow at the Foundation for American Innovation and a former adviser to the White House on AI, wrote in a post on X that \u201cdegrading performance on ML research *without telling the user* is shockingly hostile and a terrible look.\u201d He continued in another post that the \u201csecret sabotage\u201d policy undermines Anthropic\u2019s overall stance, because it limits AI researchers from collaborating on AI safety.<\/p>\n<p>\u201cIt felt like Anthropic was saying to the public, \u2018We don&#8217;t trust anybody else to do AI research. We are the only ones who have to do AI research,\u201d says Will Brown, research lead at the open-source AI startup Prime Intellect. \u201cIt feels a bit like they\u2019re starting to pull the ladder up behind them.\u201d<\/p>\n<p>Brown said the policy would also have left developers in the dark about whether they were violating Anthropic\u2019s rules, since the company wouldn\u2019t alert them when its safeguards were triggered. He added that the restrictions could have had widespread consequences. For example, he pointed to the growing ecosystem of third-party evaluation firms that test frontier models for safety, performance, and reliability-work that could have been hindered if Anthropic secretly degraded its model.<\/p>\n<p>Anthropic said it implemented the measures because Claude has become increasingly effective at accelerating AI research. In a recent blog post, the company said it is concerned that AI could improve its capabilities faster than society can adapt to them. Anthropic argued that it would be \u201cgood for the world to have the option to slow or temporarily pause frontier AI development to enable societal structures and alignment research to keep up.\u201d<\/p>\n<p>\u201cThese safeguards prevent foreign adversaries from using our most capable models in ways that pose severe safety risks. The US and its allies hold an edge in frontier chips and the highly optimized software that runs them at full potential,\u201d the company said in a statement to WIRED. \u201cThese safeguards ensure Claude isn&#8217;t used to erode that advantage-by optimizing chips developed by those adversaries, for example \u2026 In deciding whether to make them visible or invisible we faced a choice. A hidden safeguard is harder to probe and work around. This means the safeguards can be targeted much more narrowly.\u201d<\/p>\n<p>Anthropic says that because this safeguard around AI development is now visible, it needs to cast a wider net, meaning more benign requests may trigger its safeguards. The company says it\u2019s working to make its classifiers more precise as quickly as possible.<\/p>\n<p>Source: <a href=\"https:\/\/www.wired.com\/story\/anthropic-responds-to-backlash-on-claudes-secret-sabotage-on-ai-research\/\" target=\"_blank\" rel=\"noopener noreferrer\">wired.com<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Anthropic apologized and reversed its policy to secretly degrade Claude Fable 5 performance for AI researchers after fierce community backlash.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[19,6],"tags":[20,21],"class_list":["post-334","post","type-post","status-publish","format-standard","hentry","category-ai-news","category-innovation","tag-anthropic","tag-claude"],"_links":{"self":[{"href":"https:\/\/demo.mail2press.com\/index.php\/wp-json\/wp\/v2\/posts\/334","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/demo.mail2press.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/demo.mail2press.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/demo.mail2press.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/demo.mail2press.com\/index.php\/wp-json\/wp\/v2\/comments?post=334"}],"version-history":[{"count":0,"href":"https:\/\/demo.mail2press.com\/index.php\/wp-json\/wp\/v2\/posts\/334\/revisions"}],"wp:attachment":[{"href":"https:\/\/demo.mail2press.com\/index.php\/wp-json\/wp\/v2\/media?parent=334"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/demo.mail2press.com\/index.php\/wp-json\/wp\/v2\/categories?post=334"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/demo.mail2press.com\/index.php\/wp-json\/wp\/v2\/tags?post=334"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}