• Home
  • Latest
  • Fortune 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia

Trendingnow

1

Despite having a $165 million net worth, Scarlett Johansson says work-life balance doesn’t exist—and the first step to success is admitting that

2

The Bezos family just donated $100 million to help achieve one of Mayor Zohran Mamdani’s top campaign promises

3

Current price of oil as of May 15, 2026

1

Despite having a $165 million net worth, Scarlett Johansson says work-life balance doesn’t exist—and the first step to success is admitting that

2

The Bezos family just donated $100 million to help achieve one of Mayor Zohran Mamdani’s top campaign promises

3

Current price of oil as of May 15, 2026
TechAI

OpenAI’s deep research can complete 26% of Humanity’s Last Exam—a benchmark for the frontier of human knowledge

By
Greg McKenna
Greg McKenna
News Fellow
Down Arrow Button Icon
By
Greg McKenna
Greg McKenna
News Fellow
Down Arrow Button Icon
February 12, 2025, 1:58 AM ET
Sam Altman holds a microphone and speaks amid a bright multicolor backdrop.
Sam Altman, CEO of OpenAI, whose AI agent has set a new standard of performance on Humanity’s Last Exam.Nathan Laine—Bloomberg/Getty Images

Artificial intelligence may be more than a quarter of the way to surpassing the boundaries of human knowledge. OpenAI’s new autonomous agent, deep research, has stormed past competing models and set a new standard on Humanity’s Last Exam, a global benchmark created to determine when AI can answer questions on any topic better than a world-class expert in the field.

Recommended Video

Deep research successfully completed 26.6% of the recently developed test, which consists of over 3,000 questions across hundreds of subjects ranging from rocket science to analytic philosophy. Powered by OpenAI’s frontier o3 model, the AI agent can synthesize a wide range of information and complete multistep research within five-to-30 minutes, its creators say.

OpenAI’s o1 and DeepSeek’s R1 models, which previously sat atop the leaderboard, could only get through roughly 9% of the exam, meaning OpenAI’s new agent represents a nearly threefold jump in performance. The company said the largest gains appeared on inquiries related to chemistry, humanities and social sciences, and mathematics.

Frank Downing, a director of research at Cathie Wood’s ARK Invest, noted that OpenAI’s new agent also set a new state-of-the-art score on GAIA, a test for AI assistants that poses real-world questions that are conceptually simple for humans, but challenging for most digital agents. The new offering provides deeper research and analysis, he added, compared with a competing product launched by Google in December.

But all those accomplishments could look miniscule, Downing said, if subsequent models from OpenAI and competitors make progress on solving Humanity’s Last Exam at a pace similar to how weaker AI models conquered previous academic benchmarks.  

“Humanity’s Last Exam could be saturated within the next 12 months,” he wrote in a note Monday, “effectively surpassing expert-level technical knowledge and reasoning capability.”

What is Humanity’s Last Exam?

The test is the result of an effort led by Dan Hendrycks, the director of the Center for AI Safety and an advisor for companies such as Scale AI and Elon Musk’s xAI. He previously had created another exam called Massive Multitask Language Understanding, or MMLU, which cutting-edge versions of Anthropic’s Claude, Meta’s Llama, and OpenAI’s Chat GPT have been able to mostly crack as of late last year.

Hendrycks said he was inspired to create Humanity’s Last Exam after a conversation with Musk about existing AI tests being too easy.

“Elon looked at the MMLU questions and said, ‘These are undergrad level. I want things that a world-class expert could do,’” Hendrycks told the New York Times in January.

So Hendrycks, with support from Scale AI, spearheaded a project designed to serve as “the final closed-ended academic benchmark of its kind with broad subject coverage.” His team compiled questions submitted by hundreds of college professors, prize-winning mathematicians, and other experts in their fields.

“[The exam] emphasizes world-class mathematics problems aimed at testing deep reasoning skills broadly applicable across multiple academic areas,” the team wrote in a paper debuting the test in January.

Once models start scoring over 50%, Hendrycks said, it’s safe to say humans have met their match in this regard. After that, the clock is presumably ticking until the world witnesses what is termed artificial general intelligence, or the ability of a machine to possess all the cognitive abilities of humans. OpenAI says it envisions this technology, commonly dubbed AGI, as being capable of producing novel scientific research.

“We are now confident we know how to build AGI as we have traditionally understood it,” OpenAI CEO Sam Altman said in a blog post in January.

On Sunday, Google DeepMind CEO Demis Hassabis said it could arrive in just five years.

“And I think society needs to get ready for that and what implications that will have,” he said in Paris on Sunday ahead of the AI Action Summit hosted by the city, CNBC reported.

On that front, time seems to be of the essence.

Join us at the Fortune Workplace Innovation Summit May 19–20, 2026, in Atlanta. The next era of workplace innovation is here—and the old playbook is being rewritten. At this exclusive, high-energy event, the world’s most innovative leaders will convene to explore how AI, humanity, and strategy converge to redefine, again, the future of work. Register now.
About the Author
By Greg McKennaNews Fellow
LinkedIn icon

Greg McKenna is a news fellow at Fortune.

See full bioRight Arrow Button Icon

Latest in Tech

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025

Most Popular

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Fortune Secondary Logo
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • World's Most Admired Companies
  • See All Rankings
  • Lists Calendar
Sections
  • Finance
  • Fortune Crypto
  • Features
  • Leadership
  • Health
  • Commentary
  • Success
  • Retail
  • Mpw
  • Tech
  • Lifestyle
  • CEO Initiative
  • Asia
  • Politics
  • Conferences
  • Europe
  • Newsletters
  • Personal Finance
  • Environment
  • Magazine
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
  • Group Subscriptions
About Us
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • Facebook icon
  • Twitter icon
  • LinkedIn icon
  • Instagram icon
  • Pinterest icon

Latest in Tech

mustafa suleyman
AIMicrosoft
Microsoft AI chief gives it 18 months—for all white-collar work to be automated by AI
By Jake AngeloMay 16, 2026
18 minutes ago
olivier
CommentaryAnthropic
I’ve been studying Big Tech for a long time. What just happened with Anthropic and the Pentagon terrifies me
By Olivier SylvainMay 16, 2026
33 minutes ago
bhaskar
Economydisruption
The prophet of the ‘Wired Belt’ says capitalism is finally eating itself
By Bhaskar ChakravortiMay 16, 2026
2 hours ago
lawyer
CommentaryLaw
Would you hire the lawyer who just got sanctioned for using AI?
By Alexandra SmythMay 16, 2026
2 hours ago
connor vukelich
Future of WorkGen Z
Meet the 20-year-old CEO who launched a company in high school to solve Gen Z’s entry-level job crisis
By Jake AngeloMay 16, 2026
4 hours ago
IDEO invented ‘human-centered design.’ Can it survive an AI world where everything looks the same?
Asiadesign thinking
IDEO invented ‘human-centered design.’ Can it survive an AI world where everything looks the same?
By Nicholas GordonMay 16, 2026
6 hours ago

Most Popular

Despite having a $165 million net worth, Scarlett Johansson says work-life balance doesn’t exist—and the first step to success is admitting that
Success
Despite having a $165 million net worth, Scarlett Johansson says work-life balance doesn’t exist—and the first step to success is admitting that
By Preston ForeMay 13, 2026
3 days ago
The Bezos family just donated $100 million to help achieve one of Mayor Zohran Mamdani’s top campaign promises
Politics
The Bezos family just donated $100 million to help achieve one of Mayor Zohran Mamdani’s top campaign promises
By Jake AngeloMay 12, 2026
4 days ago
Current price of oil as of May 15, 2026
Personal Finance
Current price of oil as of May 15, 2026
By Joseph HostetlerMay 15, 2026
24 hours ago
Nearly 50,000 Lake Tahoe residents have to find a new power source after their energy source looks to redirect lines to data centers
Travel & Leisure
Nearly 50,000 Lake Tahoe residents have to find a new power source after their energy source looks to redirect lines to data centers
By Catherina GioinoMay 12, 2026
4 days ago
The airplane fuel shortage is a myth propagated by airlines who want to cancel unprofitable flights, says private jet CEO
Energy
The airplane fuel shortage is a myth propagated by airlines who want to cancel unprofitable flights, says private jet CEO
By Jim EdwardsMay 14, 2026
2 days ago
Top economist says $39 trillion national debt leaves government worse prepared for recession than ever
Economy
Top economist says $39 trillion national debt leaves government worse prepared for recession than ever
By Eva RoytburgMay 14, 2026
2 days ago

© 2026 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.