Anthropic calls for the red-teaming of AI models to be standardized

Anthropic, the buzzy San Francisco-based AI startup founded by researchers who broke away from OpenAI, yesterday published an overview of how it’s been red-teaming its AI models, outlining four approaches and the advantages and disadvantages of each. Red teaming, of course, is the security practice of attacking your own system in order to uncover and address potential security vulnerabilities. For AI models, it goes a step further and involves exploring creative ways someone may intentionally or unintentionally misuse the software.

Red teaming has also taken a prominent role in discussions of AI regulation. The very first directive in the Biden administration’s AI executive order mandates that companies developing high-risk foundation models notify the government during training and share all red teaming results. The recently enacted EU AI Act also contains requirements around providing information from red teaming.

As lawmakers rally around red teaming as a way to ensure powerful AI models are developed safely, it certainly deserves a close eye. There’s a lot of talk about the results of red teaming, but not as much talk about how that red teaming is conducted. As Anthropic states in its findings, there’s a lack of standardization in red-teaming practices, which hinders our ability to contextualize results and objectively compare models.

Anthropic, which has a close partnership with Amazon, concludes its blog post with a series of red-teaming policy recommendations, including suggestions to fund and “encourage” third-party red teaming. Anthropic also suggests AI companies should create clear policies tying the scaling of development and release of new models with red teaming results. Through these suggestions, the company is weighing in on a running debate about the best practices for AI red teaming and the trade-offs associated with various levels of disclosure. Sharing findings enhances our understanding of models, but some worry publicizing vulnerabilities will only empower adversaries.

Anthropic’s approaches, as outlined in the blog post, include using language models to red team, red teaming in multiple modalities, “domain-specific, expert red teaming,” and “open-ended, general red teaming.”

The domain-specific red teaming is particularly interesting, as it includes testing for high-risk trust and safety risks, national security risks, and region-specific risks that may involve cultural nuances or multiple languages. Across all of these areas, Anthropic highlights depth as a significant benefit: Having the most knowledgeable experts extensively investigate specific threats can turn up really nuanced concerns that might otherwise be missed. At the same time, this approach is hard to scale, doesn’t cover a lot of ground, and often turns up isolated model failures that while potentially significant, are challenging to address and don’t necessarily tell us very much about the model’s likely safety in most real-world deployments..

Using AI language models to red team other AI language models, on the other hand, allows for quick iteration and makes it easier to test for a wide range of risks, Anthropic says.

“To do this, we employ a red team / blue team dynamic, where we use a model to generate attacks that are likely to elicit the target behavior (red team) and then fine-tune a model on those red-teamed outputs in order to make it more robust to similar types of attack (blue team),” reads the blog post.

Multi-modal red teaming is becoming necessary simply because models are increasingly being trained on and built to output multiple modalities, including text, images, video, and code. Lastly, Anthropic describes open-ended, general red teaming such as crowdsourced red teaming efforts and red teaming events and challenges. These more communal approaches to open-ended red teaming have the longest lists of benefits versus challenges. Many of the pros revolve around benefits to the participant, such as it being an educational opportunity and a way to involve the public. And while these techniques can identify potential risks and help harden systems against abuse, they both offer a lot more breadth than depth, according to Anthropic.

Looking at all these techniques together, it’s hard to imagine how red teaming could be successful without each and every one. It’s also easy to see why different approaches to red teaming can turn up such different findings and why standards are becoming ever more important.

In his executive order, Biden also ordered the National Institute of Standards and Technology to create “rigorous standards for extensive red-team testing to ensure safety before public release.” Those standards have yet to arrive, and there’s no indication when they will. With new, more powerful models being released every day without transparency into their development or risks, they can’t come soon enough.

Now, here’s some more AI news.

Sage Lazzaro
sage.lazzaro@consultant.fortune.com
sagelazzaro.com

AI IN THE NEWS

Elon Musk drops his lawsuit against OpenAI and cofounders Sam Altman and Greg Brockman. That’s according to CNBC. A day before a California judge was going to consider the defendants’ request to dismiss the case, Musk officially withdrew his suit against OpenAI, Altman, and Brockman (the company’s current CEO and president, respectively). Musk, who also cofounded the company before going separate ways in 2018, was suing for breach of contract and fiduciary duty, alleging the company abandoned its nonprofit mission to develop AGI “for the benefit of humanity” and turned into a for-profit entity. While Musk certainly isn’t the only one to think OpenAI has strayed, legal scholars have largely considered it to be a strange, and weak, case. Email exchanges between Musk and OpenAI executives, published by OpenAI in direct response to the lawsuit back in March, also showed Musk advocating for a for-profit pivot back when he was involved.

Microsoft kills its custom GPT Builder just three months after launch. That’s according to Windows Latest. The company informed Copilot Pro users that it’s ending support for the GPT Builder—which enabled users to create custom versions of the model for specific purposes, much like ChatGPT GPTs—on July 10. The company hasn’t given a reason for retiring the feature so soon. Interestingly, support for GPTs and the GPT Builder will continue for commercial and enterprise plan users.

OpenAI hits $3.4 billion in annualized revenue, up from $1.6 billion in late 2023. That’s according to The Information. CEO Sam Altman told staff of the rising growth—a near doubling in six months. The vast majority, $3.2 billion, comes from API fees and subscriptions to the company’s chatbots, sources told The Information.

Brazil’s government taps ChatGPT to screen and analyze court cases. That’s according to Reuters. In an effort to avoid the costs of court losses, Brazil’s government will have ChatGPT analyze in-progress cases, including flagging lawsuits it needs to act on and providing trends and suggestions for action. The government will access the AI through Microsoft's Azure cloud-computing platform, but it’s not clear how much it’s paying for the services. The law field has been quick to embrace generative AI, as my Eye on AI cowriter Jeremy Kahn reported the other week, however many are using copilots specific for law applications rather than general-purpose models like ChatGPT.

FORTUNE ON AI

How Amazon blew Alexa’s shot to dominate AI, according to more than a dozen employees who worked on it —by Sharon Goldman

Apple Intelligence solves one of Tim Cook’s biggest problems: finally giving customers a reason to upgrade their iPhone —by Dave Smith

OpenAI’s Mira Murati fires back at Elon Musk for describing her company’s new partnership with Apple as ‘creepy spyware’ —by Verne Kopytoff

AI models illegally training on real children, including for explicit materials, alarming researchers —by Eva Roytburg

How ServiceNow is infusing AI everywhere and got 84% of the workforce to use it daily —by John Kell

Adapt or die: Is the future bright for business amidst the AI boom? —Alex Wood Morton

AI CALENDAR

June 25-27: 2024 IEEE Conference on Artificial Intelligence in Singapore

July 15-17: Fortune Brainstorm Tech in Park City, Utah (register here)

July 30-31: Fortune Brainstorm AI Singapore (register here)

Aug. 12-14: Ai4 2024 in Las Vegas

EYE ON AI NUMBERS

7%

That’s the percentage of the U.S. online population that uses ChatGPT on a daily basis. In France it’s just 2%, and in Japan, just 1%, according to a study on the public perception of leading generative AI tools conducted by the Reuters Institute and the University of Oxford. While ChatGPTs daily usage is limited, it still far exceeds its competitors; the OpenAI chatbot is used roughly two or three times more than the next leading products, Google Gemini and Microsoft Copilot, according to the findings.

This is the online version of Eye on AI, Fortune's biweekly newsletter on how AI is shaping the future of business. Sign up for free.