The Only Defense Against AI with Uma Roy
ELI5/TLDR
AI can now fake any image, video, or voice convincingly, and software that tries to detect AI fakes mostly does not work. Uma Roy (CEO of Succinct, the company behind the fastest zero-knowledge virtual machine) argues that the real defense is not better detection but cryptographic provenance — every photo signed by the camera the moment it is captured, every edit logged, every published piece traceable back to a verifiable origin. Balaji Srinivasan calls this the “ledger of record” and adds that as AI shrinks the cost of producing content, the cost of verifying it explodes — pushing everyone into smaller “digital tribes” where trust is local. The conversation is half engineering blueprint, half manifesto for what news, journalism, and identity look like when “real” has to be proved.
The Full Story
The detector arms race is already lost
Succinct ran a benchmarking study against the major commercial AI-image detectors. Take a synthetic image, apply a tiny perturbation — a slight blur, a crop, imperceptible Gaussian noise — and the detector’s confidence drops from “this is AI” to “this is real.” It works on receipts (where someone has used AI to inflate the numbers), on cars with AI-added damage for insurance fraud, on doctored editorial photos. The pattern holds.
Our conclusion from this study was that AI detection is a dead end.
The reason is structural. Generative models are trained on a loss function that explicitly minimises the distance between what they output and the manifold of real data. Asking a separate model to spot the difference is asking it to detect what the first model was specifically optimised to make undetectable. The decision boundaries in such high-dimensional spaces are full of holes — the same property that lets adversarial examples flip an image of a panda into “dog” with a few pixels of noise.
Roy concedes that text is a partially happier story (em-dashes, the “it’s not X, it’s Y” cadence, the overdramatic posture all give ChatGPT slop away to a careful reader), but image and video are heading the wrong direction fast.
The provable tech stack
Roy’s pitch is to fold cryptography into every step of how content is made and shipped:
- Cryptographic capture. A camera or phone has a hardware-rooted private key. The moment a photo is taken, the chip signs the raw sensor data, binding the content to a specific device, time, and location. Every iPhone already has a Secure Enclave capable of doing this.
- Chain of custody for edits. Every transformation — crop, colour-correct, retouch — gets a cryptographically signed manifest, appended in order. Think of it like git for an image, where each commit is provable.
- Ledger of record. When the content is published, both the original signature and the full edit manifest go onto an unbiased permanent ledger.
- Public verification. When you scroll past the image on Instagram or X, the front end has already checked the signatures against the ledger. You see a “pink check mark” — analogous to today’s blue check on accounts, but applied to content.
Adam Mosseri (head of Instagram) reportedly posted in early 2026 that the platform’s framing was shifting from “assume what you see is real” to “start with skepticism,” with cryptographic fingerprinting of real media as the path forward. Roy reads this as Big Social finally arriving at what Balaji had been describing for half a decade.
Camera manufacturers will cryptographically sign images that capture, creating a chain of custody.
Why detection had to lose and provenance has to win
A useful frame from Roy: AI generation is upstream of AI detection in the training loop. Provenance sidesteps this entirely. It does not ask “is this fake?” — a question that gets harder every quarter — it asks “can you show me where this came from?” — a question whose answer is binary and durable.
There is still an “analog hole.” You can hold a real camera up to a screen showing a generated image, and the signature will say “real photo, taken at this place and time.” So a human attestation always sits at the very base of the trust chain. But that is the same trust model human society has always run on. The point is to push the unverifiable bit as far down the stack as possible, so most attacks become expensive enough not to bother.
Tribes, not platforms
Balaji’s overlay on this: AI does not just degrade the information environment, it reshapes its geometry. Inside a tribe — a company, a religious community, a network state — AI is a productivity miracle. It can read every Slack message, surface a remark from three years ago, find security holes in a five-year-old commit. That is “indexing,” and it is consensual.
Outside the tribe, the same tools become spam, scams, slop, fraud. The cost of producing a plausible message goes to zero; the cost of verifying it goes to the moon. So everyone retreats into smaller circles where they can vouch for who they are talking to.
The future is a billion-person Chinese superstate or a thousand million-person network states.
China is the largest digital tribe by virtue of central moderation — they can roll out a single AI-detection regime across WeChat for a billion users overnight. The rest of the world has to assemble trust bottom-up, which is where cryptographic provenance plus a “web of trust” (you trust me because someone you trust trusts me) becomes the substitute for a central choke point.
A concrete near-term application: NS News
The two of them spend a chunk of the back half talking about funding decentralised journalism with the provable stack underneath. The model: instead of one full-time journalist on $50,000 covering everything, fifty domain experts collecting bounties for the one or two stories per year they actually know about — paid in cryptocurrency, verified with cryptography. Coverage focused on networks, startup societies, biotech, technology — areas where mainstream outlets either lack expertise or have hostile incentives.
The provenance stack makes this work because the bounty buyer can verify the reporter actually was where they said they were, with a real device, on a real day. C2PA — the existing metadata standard for content provenance — slots in here as the format for tracking edits.
What is and is not being claimed
Roy is careful: she is not arguing every Instagram post should be required to carry a real-content signature. AI-generated images can be art, can be entertainment, can be useful. The proposal is opt-in. If you want to prove something is real, the tools should exist and be cheap to use. The high-stakes wedge — political ads, war reporting, presidential video, insurance claims — is where the technology has to land first because that is where the harm is concentrated.
Key Takeaways
- AI detectors fail under simple perturbations (blur, crop, light noise). Succinct’s published benchmark at
aidetection.succinct.xyzis the receipt. - The reason detection is a dead end is structural: generators are explicitly trained to minimise the distance from real content. Detection is the inverse problem they were optimised against.
- Provable provenance has four layers: cryptographic capture (signed at the device), chain-of-custody edits (signed manifests), ledger of record (unbiased shared store), and public verification (front-end checks the chain).
- Every iPhone already has the Secure Enclave hardware to do step 1. The blocker is software — Succinct is building an SDK to expose it consistently across device types.
- The “analog hole” — pointing a real camera at a fake screen — is unfixable in principle, so a human attestation always sits at the base. The goal is to make attacks expensive, not impossible.
- C2PA is the existing metadata standard for content provenance and slots into the chain-of-custody layer.
- Adam Mosseri (Instagram) publicly shifted in early 2026 from “assume real” to “start with skepticism” — signalling that platform incentives are starting to align with the provenance stack.
- AI’s real second-order effect is not misinformation but tribalisation: the production cost of content goes to zero, the verification cost goes to infinity, so trust shrinks to people you already know.
- “Web three of trust” — using cryptographic signing to extend webs of personal trust further than human social bandwidth allows — is Balaji’s name for the social layer that sits on top of the technical stack.
- Decentralised journalism using bounties paid in crypto and verified with cryptography (50 part-time domain experts, not 1 full-time generalist) is the use case both speakers are most excited to fund near-term.
Claude’s Take
Score: 7/10. The technical diagnosis is sharp and probably correct — detection genuinely is losing the arms race for structural reasons, and the provable-stack alternative is a real engineering programme rather than vapourware. Succinct shipping benchmarks and an SDK are evidence of operating-tempo, not just thesis-tweeting.
The conversation also has the predictable Balaji shape, where every concrete technical question gets pulled into a larger civilisational frame within thirty seconds. Sometimes that’s productive — the “cost of production collapses, cost of verification explodes” framing is a genuinely useful way to think about why social trust is reorganising. Sometimes it lapses into prescription dressed as prediction (the “billion-person Chinese superstate or thousand million-person network states” line is a thesis, not a forecast). The two of them are in mutual-admiration mode throughout, which thins the friction the ideas would benefit from.
The thing that doesn’t get pushed on enough: who controls the ledger? “Unbiased permanent ledger” gestures at a blockchain, but in practice that means someone’s protocol with someone’s economic incentives, and the same Web3 capture failures that plagued the previous cycle (rent extraction, gatekeeping, bad UX) apply here. There’s also a real question about adversarial users tampering with hardware before the chip ever signs anything — the “analog hole upstream” Balaji mentions in passing — that doesn’t get a serious answer.
Worth watching for the technical exposition of provenance and the meta-frame on how AI changes social geometry. Discount the political superstructure to taste.
Further Reading
- C2PA / Content Authenticity Initiative — the existing metadata standard for cryptographic content provenance; the layer Succinct’s stack plugs into.
- Adversarial Examples for Image Classifiers (Goodfellow et al., 2014) — the canonical paper on why detectors are fragile under perturbation; explains the panda-to-dog flip.
- Succinct’s SP1 ZKVM —
succinct.xyz; the engineering substrate beneath the provenance pitch. - The Network State (Balaji Srinivasan, 2022) — book-length version of the digital-tribes argument that runs throughout the conversation.
- Instagram on AI content (Adam Mosseri, 2026) — the early-2026 statement Roy quotes; useful as a marker of where mainstream platforms are landing.