Palo Alto Networks CEO: "AI Found 5 Years of Bugs in 6 Weeks"

ELI5/TLDR

Nikesh Arora, who runs the biggest cybersecurity company in the world, ran an AI tool over his own software for six weeks. It found security holes that would have taken his human team five to seven years to find. The catch: the same AI that’s brilliant at finding holes is unreliable at confirming whether a hole is actually there (it cried wolf 30% of the time), which makes it a fantastic weapon and a clumsy shield. He also thinks this capability is months, not years, away from being freely available to anyone, including attackers, and that a wave of cheap AI software is about to flatten a big chunk of the traditional software industry.

The Full Story

This is an All-In Podcast interview with Nikesh Arora, CEO of Palo Alto Networks. He ran the company from a $17 billion market cap to roughly $238 billion over eight years, was previously Google’s chief business officer and SoftBank’s president. So when he talks about both AI and enterprise software economics, he’s talking from inside the machine. The conversation has two engines: what AI does to cybersecurity, and what AI does to the business of selling software.

The bug-finding result

The headline number is real and it’s the most concrete claim in the interview. Palo Alto pointed an AI tool (referred to throughout as “Mythos,” an internal or partner project) at its own codebase.

In 6 weeks we found vulnerabilities which would have normally taken us 5 to seven years to find.

That’s in their own code, not a customer’s. And Palo Alto is, by its own description, a top-percentile company for code testing, because security is the product. So this is a best-case-defended target, and the AI still found years of latent bugs in weeks.

Two details make this more interesting than a vendor brag. First, on “ultra mode” (persistent thinking, where the model keeps grinding until it gets somewhere), the AI could daisy-chain vulnerabilities. A single bug is often harmless on its own. Daisy-chaining means stringing several small, individually-boring weaknesses together into an actual route inside the system. That’s the difference between “the lock is slightly loose” and “here’s how I walk from the loose lock to the safe.” Second, the cost: “low millions” of dollars in tokens, and falling. The economics of running this against everything are already plausible.

Why this is a weapon, not a shield

Here’s the part Arora flags as the thing nobody talks about, and it’s the most useful idea in the whole interview. The false positive rate on Mythos was 30%.

So the problem is it’s great for attack. It’s horrible for defense.

The asymmetry is the whole point. If you’re attacking, false positives cost you nothing. The AI flags 100 possible holes, 30 are phantom, you try all 100 anyway, you only need one to be real. But if you’re defending, every false positive is a fire drill: you scramble to patch a hole that was never there. As one host put it, “no missile inbound.” Run that at scale and your security team drowns in phantom alerts.

He generalizes the lesson beyond security. Plug a raw model into any business process where being wrong costs money, and a 10-20% false positive rate is ruinous. His examples: an AI paying insurance claims (every false positive is money out the door), or a self-driving car.

I’m not putting my kids in that car with a 10% false positive rate. Are you?

So the real work, he argues, isn’t getting a smarter model. It’s the engineering around the model, the harnesses, the training, the context, that drives false positives toward zero without letting real threats slip through (false negatives). That post-model layer is where the actual value gets built.

The race, and where the real danger lives

We’re now in a race between defenders finding-and-patching holes and attackers finding-and-exploiting them. Arora’s read on how we’re doing: “not as well as we should be doing, which is great for our business, but that’s a different story.” He thinks this capability is roughly three months from being available in the wild via open-source or Chinese models, faster than the six months people had assumed, because the frontier models already out (he name-checks “4.8” and “5.5”) have similar capability. And you don’t need to crack the hardest target, just find a soft one. An old industrial control system running ancient code on the edge is an easy mark.

But the danger he actually worries about isn’t the dramatic one. It’s not someone cracking a PG&E power plant. National-security targets are well-defended (they spend ~10% of budget on it). The soft underbelly is the dentist’s office, the doctor’s office, the small business running some packaged software it never patches. His reference point is the Change Healthcare ransomware breach, which took down one clearinghouse and effectively shut down physician offices across the country, forcing UnitedHealth to front billions in credits.

It’s less about cracking some PG&E power generation facility. It’s more economic chaos.

He’s also skeptical that holding back powerful models helps. The reason is almost comically physical: a model-company CEO told him the entire weights of their newest frontier model fit on a USB stick, and the thing can be distilled into a copycat in 24 to 48 hours. If that’s the IP, you can’t keep it in a vault for six months. Someone open-sources an equivalent and the head start evaporates.

The SaaS reckoning

The second engine of the conversation is what AI does to the software business. Arora sorts software into buckets and hands down verdicts.

Analytical SaaS is dead. The classic pitch was “give me your data and I’ll analyze it for you.” Now you can point a model at your own data and analyze it yourself.

If you’re an analytical SaaS company, it’s over.

The host offered a live anecdote: a SaaS product with 20 seats nobody logged into, the data just sitting there. They cut to three accounts, wired the data to Slack and Claude so anyone could query it in plain English, and cut the bill 90%. That’s the apocalypse in miniature. These vendors lose pricing power, because a customer can now say “I’ll put ten developers on this and save ten million dollars.”

Infrastructure software is undervalued. Databases, storage, the plumbing, Databricks, Snowflake, MongoDB, Oracle. His claim is that enterprises will need 10x the data stored over the next three years, because AI systems need memory and context to learn what good and bad look like (the same data-hunger applies to cyber defense). Anything that helps you collect and manage data gets more valuable.

Systems of work / record get re-engineered. Salesforce, Oracle, the deeply-embedded operational software. His provocative line:

UI is the worst thing we did as technologists.

The argument: today armies of product managers build interfaces so humans can poke at the data behind them. If agents can do the poking, the UI disappears. The salesperson just says “I had the call, do the paperwork,” and the agent files everything into the back-end systems. It’s already happening passively, agents reading email, ingesting Zoom transcripts, so there’s no data entry at all. Bonus: the audit trail gets cleaner because humans stop touching the data.

The model layer vs the application layer

This is where Arora plants a flag against the OpenAI worldview. He doesn’t think raw models are where the money is. He thinks models become a utility, intelligence you buy by the drink. Need a dumb task done? Buy “120 IQ” for a cent. Need a hard one? Buy “250 IQ” for ten dollars. The profit, he insists, sits in applications, because most companies have no idea how to use raw models, and 50,000 companies all need the same HR system or sales system. Nobody sane rebuilds their entire sales stack on top of OpenAI directly.

It’s silly for me to use OpenAI directly and rewrite my entire sales system because I’m smart, right? I’m not. I want somebody to do it for me.

The counter-question, raised by the hosts: doesn’t OpenAI/Anthropic eventually become the new Microsoft Office, releasing legal models, accounting models, the application layer itself, to hit their revenue numbers? Arora’s bet is that a still-unformed layer of arbitrage companies will sit between the models and the business problem, building the harnesses and memory. He concedes that layer isn’t fully built yet.

Where’s the money to be made? Two beautiful targets. Replacement TAMs (markets): if you replace existing software, the customer already has the budget, easy sale. Consumer revenue: it’s easy to get five bucks a month per user, and we all already pay more in subscriptions than we ever did on cable.

The contrarian endnote

Two final counterpoints to the prevailing AI narrative. First, hardware isn’t dying. It’s still the cheapest way to move low-latency, high-throughput bits, which is literally what a data center is. He notes financial services firms (Goldman, JPMorgan) resist the cloud precisely because added latency costs them money. And the bottleneck on new hardware isn’t design, it’s production, everything is backordered as the world races to build GPU-laden data centers. Dell, once written off, is back near a $300-400 billion market cap.

Second, and most against-the-grain: he expects to employ more people, not fewer. “I think we’re going to have more people at Palo Alto on the technology side than we’ve ever had before,” because AI forces every system to be transformed, and someone has to do the transforming.

Key Takeaways

AI ran over Palo Alto’s own (well-tested) codebase for six weeks and found vulnerabilities that would have taken human teams five to seven years to surface. Cost: low single-digit millions in tokens, and falling.
“Ultra mode” / persistent thinking lets the AI daisy-chain individually-harmless bugs into a real attack path, the hard part of actual exploitation.
The core asymmetry: AI is great for offense, poor for defense, because a high false-positive rate (30% on Mythos) is free for an attacker but a constant fire drill for a defender.
The real value isn’t a smarter model, it’s the post-model engineering (harnesses, training, context) that pushes false positives toward zero without raising false negatives.
This capability is ~3 months from the wild via open-source/Chinese models, not the 6 months people assumed, because frontier models already out have similar capability.
You don’t need to crack hard targets. Old industrial/OT systems on the edge are easy marks.
The danger isn’t dramatic infrastructure attacks (well-defended). It’s economic chaos via soft targets, small offices, unpatched packaged software, like the Change Healthcare ransomware breach that froze physician offices nationwide.
89% of breaches still come from rudimentary causes (stolen credentials, weak passwords), not exotic exploits.
Holding models back is futile: a frontier model’s entire weights fit on a USB stick and can be distilled into a copycat in 24-48 hours.
Arora’s SaaS verdict: analytical SaaS is dead; infrastructure/data software is undervalued (enterprises need ~10x data storage in 3 years); systems-of-work get re-engineered as agents remove the UI.
His model-economics thesis: models become a metered utility; profit pools live in the application layer, served by a still-forming class of arbitrage companies, not by enterprises wiring up raw models themselves.
Contrarian closers: hardware isn’t dying (latency economics; production, not design, is the bottleneck), and he expects to hire more technical staff because AI forces every system to be rebuilt.

Claude’s Take

This is a strong interview carried by one genuinely load-bearing insight and a lot of confident, self-interested forecasting. Score it a 7.

The offense/defense asymmetry is the real thing here, and it’s worth internalizing. “AI is great at finding holes and bad at confirming them, so it favors attackers” is a clean, durable mental model that survives whichever model number is current. The Change Healthcare framing (the danger is economic disruption of soft targets, not Hollywood infrastructure attacks) is also a useful corrective to the usual cyber-doom narrative, and the “89% of breaches are stolen credentials” stat is a healthy splash of cold water on the whole conversation.

Where to apply the BS filter. The “5 to 7 years in 6 weeks” headline is impressive but unfalsifiable, it’s a counterfactual estimate from the vendor whose stock benefits from cyber-anxiety, and he cheerfully admits the race going badly “is great for our business.” His SaaS verdicts (“analytical SaaS is over,” “infrastructure undervalued”) are the kind of clean, quotable taxonomy that sounds more settled than it is, and they conveniently route value toward the infrastructure-and-applications world rather than the model labs. The model-as-utility thesis is a real intellectual position, but it’s also exactly what you’d argue if you were not a model company and wanted to believe the model labs won’t eat the application layer. That debate (do OpenAI/Anthropic become the new Microsoft Office?) is genuinely open, and his confidence outruns the evidence, which the hosts to their credit keep poking at.

Net: the cybersecurity half is the keeper. The software-business half is sharp, plausible, and should be read as one very smart, very interested party’s bet, not as a forecast. Docked a point for the talking-his-book texture, and the format (rapid-fire across cyber, SaaS, hardware, M&A) means nothing gets fully pressure-tested.