Anthropic Updates ‘Responsible Scaling’ to Minimize AI Risks

By Paula Parisi
October 18, 2024

Anthropic, maker of the the popular Claude AI chatbot, has updated its Responsible Scaling Policy (RSP), designed and implemented to mitigate the risks of advanced AI systems. The policy was introduced last year and has since been improved, with new protocols added to ensure AI models are developed and deployed safely as they grow more powerful. This latest update offers “a more flexible and nuanced approach to assessing and managing AI risks while maintaining our commitment not to train or deploy models unless we have implemented adequate safeguards,” according to Anthropic.

The revised policy “sets out specific Capability Thresholds — benchmarks that indicate when an AI model’s abilities have reached a point where additional safeguards are necessary,” VentureBeat reports. “The thresholds cover high-risk areas such as bioweapons creation and autonomous AI research, reflecting Anthropic’s commitment to prevent misuse of its technology.”

The update also further details duties of the Responsible Scaling Officer, a role maintained at Anthropic to supervise compliance and attend to safeguards, ensuring they are in place.

“By learning from our implementation experiences and drawing on risk management practices used in other high-consequence industries, we aim to better prepare for the rapid pace of AI advancement,” Anthropic explains in an announcement.

“Anthropic’s updated Responsible Scaling Policy arrives at a critical juncture for the AI industry, where the line between beneficial and harmful AI applications is becoming increasingly thin,” writes VentureBeat, adding that the decision to formalize Capability Thresholds with attendant Required Safeguards “shows a clear intent to prevent AI models from causing large-scale harm, whether through malicious use or unintended consequences.”

Anthropic assesses specific threat levels, starting at the lowest risk, AI safety level-1 (ASL), which encompasses the earliest large language models, progressing to ASL-2, current LLMs, including Anthropic’s Claude, “that have the ability to provide dangerous information — however, not more than what a search engine could,” reports Silicon Republic.

“The higher risk ASL-3 includes models that show low-level autonomous capability while the higher ASL-4 and up is reserved for future advances, with Anthropic saying this technology could have ‘catastrophic misuse potential and autonomy,’” notes Silicon Republic.

“Since we first released the RSP a year ago, our goal has been to offer an example of a framework that others might draw inspiration from when crafting their own AI risk governance policies,” Anthropic points out, declaring its intent to “contribute to the establishment of best practices across the AI ecosystem.”

Anthropic Updates ‘Responsible Scaling’ to Minimize AI Risks

No Comments Yet

Leave a comment