U.K. startup Aligned AI claims breakthrough in CoinRun game designed to test AI safety

A small startup in Oxford, England says it has achieved an important breakthrough in AI safety that could make self driving cars, robots, and other AI-based products far more reliable for widespread use.

Align AI, a one-year-old company, says it has developed a new algorithm that allows AI systems to form more sophisticated associations that are more akin to human concepts. The achievement, if it holds up in real-world testing, could overcome a common problem with current AI systems, which often draw spurious correlations from the data they’re trained on, leading to disastrous consequences outside the lab.

The danger of such incorrect correlations—or “misgeneralizations” in AI lingo—was made tragically clear in 2018 when an Uber self-driving car struck and killed a woman crossing the road in Arizona. The training data Uber had fed the car’s AI software had only ever depicted pedestrians walking in crosswalks. So while Uber’s engineers thought the software had learned to detect pedestrians, all it had actually learned was to identify crosswalks. When it encountered a woman crossing the street outside a crosswalk, it failed to register the woman as a pedestrian at all, and plowed right into her.

According to Rebecca Gorman, cofounder and CEO of Aligned , the company’s so-called algorithm for concept extraction, or ACE, is much better at avoiding making such spurious connections.

Gorman told Fortune she saw potential uses for the new algorithm in areas such as robotics. Ideally, we’d want a robot that has learned to pick up a cup in a simulator to be able to generalize that knowledge to picking up different sizes and shapes of cups in different environments and lighting conditions, so it could be used for any setting without retraining. That robot would also ideally know how to operate safely around people without the need to be confined in a cage as many industrial robots are today.

“We need ways for those AIs that are operating without continual human oversight to still act in a safe way,” she said. She said ACE could also be useful for content moderation on social media or internet forums. ACE previously excelled on a test for detecting toxic language.

The AI scored highly on special a video game similar to Sonic the Hedgehog

To demonstrate the prowess of the ACE model, Align AI set it to loose on a simple video game called CoinRun.

CoinRun is simplified version of a game like Sonic the Hedgehog, but it’s used by AI developers as a challenging benchmark to evaluate how well a model can overcome the tendency to make spurious connections. A player, in this case an AI agent, has to navigate a maze of obstacles and hazards, avoiding monsters, while searching for a gold coin and then escaping to the next level of the game.

CoinRun was created by researchers at OpenAI in 2018 as a simple environment to test how well different AI agents could generalize to new scenarios. This is because the game presents the AI agents with an endless series of levels in which the exact configuration of the challenges the agent must overcome—the location of the obstacles, pits, and monsters—keeps changing.

But in 2021, researchers at Google DeepMind and a number of British and European universities realized that CoinRun could actually be used to test whether agents “misgeneralized”—that is, learned a spurious correlation. That is because in the original version of CoinRun, the agent always spawns in the top left corner of the screen and the coin always appeared at the lower right corner of the screen, where the agent could exit to the next level. So AI agents would learn to always go to the lower right. In fact, if the coin was placed elsewhere, the AI agents would often ignore the coin, and still go to the lower right. In other words, the original CoinRun was supposed to be training coin-seeking agents but instead trained lower-right-corner-seeking agents.

It is actually very difficult to get agents not to misgeneralize. This is especially true in situations where the agent cannot be given a new reward signal continuously and simply has to follow the strategy it developed in training. Under such conditions, the previous best AI software could only get the coin 59% of the time. This is only about 4% better than an agent just performing random actions. But an agent trained using ACE got the coin 72% of the time. The researchers showed that the ACE agent now seeks out the coin, rather than running right past it. It also understands situations where it can race to grab a coin and advance to the next level before being eaten by an approaching monster, whereas the standard agent in that situation remains stuck in the left corner, too afraid of the monster to advance—because it thinks the goal of the game is to get to the lower right of the screen, not to get the coin.

ACE works by noticing differences between its training data and new data—in this case, the location of the coin. It then formulates two hypotheses about what its true objective might be based on these differences—one the original objective that it learned from training (go to the lower right), and the other a different objective (seek the coin). It then tests which one seems to best account for the new data. It repeats this process until it finds an objective that seems to fit data differences it has observed.

In the CoinRun benchmark, it took the ACE agent 50 examples with the coin in different locations before it learned the correct objective was to get the coin, not to go to the lower right. But Stuart Armstrong, Aligned AI’s cofounder and chief technology officer, said he saw good progress with even half that number of examples and that the company’s goal is to get this figure down to what’s called “zero shot” learning, where the AI system will figure out the right objective the first time it encounters data that doesn’t look like its training examples. That would have been what was needed to save the woman killed by the Uber self-driving car.

Aligned AI recently closed a $730,000 angel round of funding that value the startup at $24 million. A patent for ACE is pending, according to Gorman.

Armstrong also said that ACE can also help make AI systems more interpretable, since those building an AI system can see what the software thinks its objective is. It might even be possible, in the future, to couple something like ACE with a language model, like the one that powers ChatGPT, to get the algorithm to express the objective in natural language.

Sept. 28: This story has been updated to include more details of Aligned AI’s recently closed angel investment funding.