• Home
  • Latest
  • Fortune 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia

Trendingnow

1

Elon Musk on MacKenzie Scott giving away $26 billion of her fortune: 'Sadly,' it makes the world a worse place

2

MacKenzie Scott alone accounted for one-third of America's $19.2 billion in megagifts last year

3

Philanthropy leader at Warren Buffett and Bill Gates’ Giving Pledge says children of billionaires are pushing them to give their wealth away faster

1

Elon Musk on MacKenzie Scott giving away $26 billion of her fortune: 'Sadly,' it makes the world a worse place

2

MacKenzie Scott alone accounted for one-third of America's $19.2 billion in megagifts last year

3

Philanthropy leader at Warren Buffett and Bill Gates’ Giving Pledge says children of billionaires are pushing them to give their wealth away faster
AI

AI reasoning models that can ‘think’ are more vulnerable to jailbreak attacks, new research suggests

By
Beatrice Nolan
Beatrice Nolan
Tech Reporter
Down Arrow Button Icon
By
Beatrice Nolan
Beatrice Nolan
Tech Reporter
Down Arrow Button Icon
November 7, 2025, 5:00 PM ET
Getty
Add Fortune on Google for similar content.

New research suggests that advanced AI models may be easier to hack than previously thought, raising concerns about the safety and security of some leading AI models already used by businesses and consumers.

Recommended Video

A joint study from Anthropic, Oxford University, and Stanford undermines the assumption that the more advanced a model becomes at reasoning—its ability to “think” through a user’s requests—the stronger its ability to refuse harmful commands.

Using a method called “Chain-of-Thought Hijacking,” the researchers found that even major commercial AI models can be fooled with an alarmingly high success rate, more than 80% in some tests. The new mode of attack essentially exploits the model’s reasoning steps, or chain-of-thought, to hide harmful commands, effectively tricking the AI into ignoring its built-in safeguards.

These attacks can allow the AI model to skip over its safety guardrails and potentially open the door for it to generate dangerous content, such as instructions for building weapons or leaking sensitive information.

A new jailbreak

Over the last year, large reasoning models have achieved much higher performance by allocating more inference-time compute—meaning they spend more time and resources analyzing each question or prompt before answering, allowing for deeper and more complex reasoning. Previous research suggested this enhanced reasoning might also improve safety by helping models refuse harmful requests. However, the researchers found that the same reasoning capability can be exploited to circumvent safety measures.

According to the research, an attacker could hide a harmful request inside a long sequence of harmless reasoning steps. This tricks the AI by flooding its thought process with benign content, weakening the internal safety checks meant to catch and refuse dangerous prompts. During the hijacking, researchers found that the AI’s attention is mostly focused on the early steps, while the harmful instruction at the end of the prompt is almost completely ignored.

As reasoning length increases, attack success rates jump dramatically. Per the study, success rates jumped from 27% when minimal reasoning is used to 51% at natural reasoning lengths, and soared to 80% or more with extended reasoning chains.

This vulnerability affects nearly every major AI model on the market today, including OpenAI’s GPT, Anthropic’s Claude, Google’s Gemini, and xAI’s Grok. Even models that have been fine-tuned for increased safety, known as “alignment-tuned” models, begin to fail once attackers exploit their internal reasoning layers.

Scaling a model’s reasoning abilities is one of the main ways that AI companies have been able to improve their overall frontier model performance in the last year, after traditional scaling methods appeared to show diminishing gains. Advanced reasoning allows models to tackle more complex questions, helping them act less like pattern-matchers and more like human problem solvers.

One solution the researchers suggest is a type of “reasoning-aware defense.” This approach keeps track of how many of the AI’s safety checks remain active as it thinks through each step of a question. If any step weakens these safety signals, the system penalizes it and brings the AI’s focus back to the potentially harmful part of the prompt. Early tests show this method can restore safety while still allowing the AI to perform well and answer normal questions effectively.

Subscribe to Fortune Gulf Brief. Every Tuesday, this new newsletter delivers clear-eyed, authoritative intelligence on the deals, decisions, policies, and power shifts shaping one of the world’s most consequential regions, written for the people who need to act on it. Sign up here.
About the Author
By Beatrice NolanTech Reporter
Twitter icon

Beatrice Nolan is a tech reporter on Fortune’s AI team, covering artificial intelligence and emerging technologies and their impact on work, industry, and culture. She's based in Fortune's London office and holds a bachelor’s degree in English from the University of York. You can reach her securely via Signal at beatricenolan.08

See full bioRight Arrow Button Icon
Add Fortune on Google for similar content.

Latest in AI

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025

Most Popular

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Fortune Secondary Logo
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • World's Most Admired Companies
  • See All Rankings
  • Lists Calendar
Sections
  • Finance
  • Fortune Crypto
  • Features
  • Leadership
  • Health
  • Commentary
  • Success
  • Retail
  • Mpw
  • Tech
  • Lifestyle
  • CEO Initiative
  • Asia
  • Politics
  • Conferences
  • Europe
  • Newsletters
  • Personal Finance
  • Environment
  • Magazine
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
  • Group Subscriptions
About Us
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • Facebook icon
  • Twitter icon
  • LinkedIn icon
  • Instagram icon
  • Pinterest icon

Latest in AI

wb
CommentaryLeadership
I grew BDO from $600 million to $3.4 billion. Here’s the 3-part formula that made it possible
By Wayne BersonJune 30, 2026
2 hours ago
vinod
CommentaryData centers
Vinod Khosla: AI’s energy crisis has a fix — and it doesn’t need the grid
By Vinod KhoslaJune 30, 2026
2 hours ago
Jamie Dimon isn’t giving up the top job. That’s turned JPMorgan into a poaching ground for CEO talent
C-SuiteNext to Lead
Jamie Dimon isn’t giving up the top job. That’s turned JPMorgan into a poaching ground for CEO talent
By Ruth UmohJune 30, 2026
2 hours ago
Comcast’s split brings former CFO Michael Angelakis back as CEO
AICFO Daily
Comcast’s split brings former CFO Michael Angelakis back as CEO
By Sheryl EstradaJune 30, 2026
3 hours ago
marc
Commentary250 Years of Innovation
The U.S. Army is opening military bases to private billions — here’s why that changes everything for the next 250 years
By Marc AndersenJune 30, 2026
3 hours ago
Mark Zuckerberg, CEO of Meta
EconomyMarkets
AI stocks are in an ‘air pocket’ and Meta and Microsoft are being traded like ‘bear market names that cannot be owned,’ top analyst says
By Jim EdwardsJune 30, 2026
4 hours ago

Most Popular

Elon Musk on MacKenzie Scott giving away $26 billion of her fortune: 'Sadly,' it makes the world a worse place
Success
Elon Musk on MacKenzie Scott giving away $26 billion of her fortune: 'Sadly,' it makes the world a worse place
By Sydney LakeJune 29, 2026
23 hours ago
MacKenzie Scott alone accounted for one-third of America's $19.2 billion in megagifts last year
Success
MacKenzie Scott alone accounted for one-third of America's $19.2 billion in megagifts last year
By Sydney LakeJune 25, 2026
5 days ago
Philanthropy leader at Warren Buffett and Bill Gates’ Giving Pledge says children of billionaires are pushing them to give their wealth away faster
Success
Philanthropy leader at Warren Buffett and Bill Gates’ Giving Pledge says children of billionaires are pushing them to give their wealth away faster
By Preston ForeJune 27, 2026
3 days ago
The retired college professor fighting a $313 trespassing ticket in Wisconsin thinks he's part of a national struggle
Environment
The retired college professor fighting a $313 trespassing ticket in Wisconsin thinks he's part of a national struggle
By Catherina GioinoJune 28, 2026
2 days ago
'Humanity has chosen to become idiots': This Brown professor switched to take-home exams after a mass shooting and discovered mass cheating
AI
'Humanity has chosen to become idiots': This Brown professor switched to take-home exams after a mass shooting and discovered mass cheating
By Catherina GioinoJune 29, 2026
16 hours ago
Current price of oil as of June 29, 2026
Personal Finance
Current price of oil as of June 29, 2026
By Joseph HostetlerJune 29, 2026
1 day ago

© 2026 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.