heading · body

YouTube

Turing Award Winner: Postgres, Disagreeing with Google, Future Problems | Mike Stonebraker

Ryan Peterman published 2026-04-20 added 2026-04-23 score 8/10
databases postgres systems history-of-computing llms text-to-sql distributed-systems
watch on youtube → view transcript

ELI5/TLDR

Mike Stonebraker — the guy who built Postgres (the database under half the internet) — tells the story of how it happened: a bond trader’s weird calendar math pushed him to invent databases you could extend with your own data types. Along the way he roasts Oracle’s Larry Ellison for lying to customers, roasts Google for pushing MapReduce and “eventually consistent” databases that were fundamentally broken, and roasts Amazon for running fifteen databases when they needed three. His latest punchline: today’s language models score zero percent on the hard real-world version of turning English into SQL, because the benchmarks everyone is bragging about are toy problems with clean data.

The Full Story

Stonebraker is 82, Turing-winning, and has opinions. The conversation runs through how the relational database actually got built, who he thinks was wrong about what, and what he’s chasing now. Throughout, one thread: the database should eat everything around it.

How a bond trader broke Ingres

In 1971, a year after Ted Codd’s paper that invented the relational database, Stonebraker and his mentor Gene Wong at Berkeley decided to build one. The competitors were clunky — IBM’s IMS forced you to organize data as trees, and the Codasyl proposal made you follow pointers by hand. Codd’s idea — tables, rows, queries — was clearly better. They called their version Ingres, and it got Stonebraker tenure in 1976.

But Ingres had a ceiling. The thing that exposed it was a phone call from a bond trader, around 1985. Ingres had just added dates — Gregorian calendar, the way a normal human thinks. March 15 minus February 15 is 28 days. Fine. Except bond math doesn’t work that way. In bond land, every month has 30 days. Always. That’s just the convention.

The trader wanted to overload subtraction. Stonebraker couldn’t let him. Dates were hardcoded.

And so he had to retrieve two dates out to user code, do the subtraction in user code, put the answer back, and it cost him a factor of two or three in efficiency.

Around the same time, a neighboring professor wanted a geographic information system — maps, points, lines, polygons. Ingres couldn’t hold those efficiently either. Two different customers, same problem: the database only understood numbers and text. Everyone else needed weirder shapes.

So Postgres (started 1986) was designed to let you add your own data types. Bond time, map polygons, whatever. This is why Postgres quietly became the database that swallowed the world — PostGIS is the reason it owns mapping, for instance. It wasn’t the speed; it was the flexibility.

Oracle’s manual-page move

Stonebraker doesn’t hide his opinion of Larry Ellison.

Larry Ellison is a fabulous salesman. At the time he made present tense and future tense indistinguishable. So he basically lied to customers.

His favorite example: referential integrity. That’s the feature that says “if you fire the last person in a department, the database has to decide what to do with the now-empty department.” Ingres implemented it. Oracle put two manual pages in their docs describing it — and at the bottom, in small type, “not yet implemented.” They were selling a promise and hoping customers would help them build it.

When Oracle later acquired MySQL, the open-source world got nervous. That’s when Postgres became the default.

Why Google was wrong twice

Around 2004, Google popularized MapReduce and the whole data industry swooned. Hadoop became the fashionable thing. Stonebraker and Dave DeWitt wrote a paper in 2011 arguing the emperor had no clothes — a proper distributed database would beat the pants off Hadoop. It did. Google eventually abandoned MapReduce.

The second Google mistake was worse: eventual consistency. To explain what that is, imagine a company with a warehouse on each coast. Each warehouse has its own database. When you sell a widget, you want both databases to agree.

The expensive way: every sale triggers a conversation across the country. The west coast says “I’m selling one.” The east coast says “okay, confirmed.” Both commit together. Nobody can oversell. But all that cross-country chatter is slow.

The cheap way — eventual consistency — is: sell the widget, fire off a message, assume the other coast will catch up in a minute. Usually fine. The problem:

If the east coast guy and the west coast guy simultaneously sell the last widget, then eventually the state of the warehouse will be minus one, and somebody won’t get their widget.

For Amazon, “ships in 24 hours” gives them room to apologize later. For a bank, a stock exchange, a hospital — you can’t have “minus one.” You can’t sell what doesn’t exist. Google’s own engineers, led by Jeff Dean, eventually admitted this and rebuilt things around Spanner, which put proper transactions back in. Stonebraker sees it as a decade-long detour the industry is still cleaning up after.

One size fits none

In 2004 Stonebraker wrote a paper arguing that “one database for everything” was a myth. His test: take a relational database, take a column store (a database that stores data by column instead of row — much faster for analytics), and take a stream processor. On their specialty workloads, each one was ten times faster than a general-purpose database. Ten times is not a tuning difference. It’s a different machine.

The modern map looks the same. ClickHouse for analytics. Pinecone for vector search. Postgres for the middle. His pragmatic advice for Shantum-sized problems:

If you want to get going, you have a database problem, the answer is choose Postgres. Until you’re trying to do a million transactions a second, it works just fine.

Above a million transactions per second, or into petabyte data warehouses, Postgres starts losing. It doesn’t have a column store. It doesn’t run across multiple machines natively. For the big leagues, you need specialized tools.

Amazon, he says, has gone too far the other way — they run fifteen database products. Stonebraker would retire twelve. The worst offender: their graph database. Graph databases are databases designed around “nodes and edges” instead of tables — good for showing friend networks or org charts. Stonebraker’s view: the underlying storage is almost always slower than a relational database, so just put a graph-shaped interface on top of a normal database and keep the fast engine underneath.

Why the query optimizer is the hard part

When the interviewer asked what was hardest about building Ingres, the answer was instant: the query optimizer. That’s the piece of the database that takes your English-like query and figures out the actual order to execute it — which table to read first, which index to use, whether to hold things in memory. Think of it like a travel agent trying to book a trip with a thousand connecting flights; there are millions of possible routes and only one is fast.

It’s just algorithmically difficult. If you ask most any senior database programmer what’s the hardest part, they’ll still say the optimizer.

Related: GPUs don’t help much with traditional databases. GPUs are good at doing the same operation on thousands of pieces of data at once. But looking up a specific row in an index — follow pointer, memory access, follow pointer, memory access — is a chain of dependent steps. You can’t parallelize it. Fast for analytics, useless for indexed lookups.

DBOS — the database eats the operating system

Stonebraker’s current swing is the biggest. The insight came from Matei Zaharia (the guy behind Spark and Databricks): a scheduler trying to coordinate a million jobs looks a lot like a database. It’s managing data at scale. So Stonebraker’s group threw the scheduling data into Postgres and watched it outperform Linux’s own scheduler.

The thought that fell out: maybe most of what an operating system does — tracking processes, files, users, state — is just database work. So why not replace the upper half of Linux with a database?

You should keep all the device driver junk down at the bottom because there’s a lot of it and no one wants to do that, and replace everything else with the database implementation.

The academic prototype worked. VCs politely said no one will displace Linux. But they funded a spin-off — DBOS — that ships the same idea as a library. You write normal code in Python, TypeScript, Go, or Java; DBOS stores the state of your program in a database underneath. Your workflow becomes durable (survives a crash, picks up where it left off), transactional, and can fail over to another machine automatically.

Two-thirds of DBOS’s customers are building AI agents. Right now most agents are read-only — they fetch things and make predictions. But agents that write — agents that move money, book travel, update records — need the ability to either finish completely or pretend they never started. That’s exactly what a database is for. Stonebraker thinks this is where the market is heading, and DBOS is sitting on the right pad.

Why the text-to-SQL hype is mostly fake

The last stretch is the most provocative. There are benchmarks called Spider and Bird that test whether an LLM can turn plain English into SQL. Current models score 85% on them. Sounds great.

Stonebraker built a real benchmark — Beaver — from four actual production data warehouses. LLMs score 0%. Give them every retrieval trick and it climbs to 10%. Hand-feed them the exact tables and joins they need to use and it reaches 35%.

Why the gap? Four reasons:

Number one, LLMs are trained on the pile. Data warehouse data is not in the pile.

(The pile is a famous dataset of public internet text used to train models. Your company’s internal warehouse schema isn’t in it.)

Number two, query complexity on Spider and Bird is maybe 10 to 20 lines of SQL. Real world data warehouses it’s 100 lines.

Third, real schemas are messy — column names like zuppers_blah_04 instead of customer_address. Fourth, every business has weird local concepts (MIT’s “J-term” for January classes) that an LLM has never seen.

A human SQL expert scores above 90% on Beaver. The models score zero. That gap is not going to close by throwing more parameters at it.

His fix: don’t ask the LLM to write the whole query. Break it into pieces. Give it the FROM clause and the JOIN structure. And critically, when the answer involves mixing different systems — your warehouse plus your CRM plus some PDFs — don’t let the LLM do the join. Turn everything into tables and let a real query optimizer do it. That’s what he’s building next, with the city of Munich as a customer: trolley schedules (SQL), traffic lights (SQL), intersection diagrams (CAD), federal regulations (text), city regulations (text) — all dumped into one query engine that understands tables.

On smart people and not-smart people

Asked how he identifies strong engineers: “I have a good feel for how difficult stuff is. If they get 3x the amount done in school that I think is reasonable, then they’re incredible.”

Asked how he spots people who aren’t smart, he’s characteristically blunt:

You talk to them and you can rapidly surface whether they’re smart or not. What was your master’s thesis? How did it exactly work? How did you deal with error conditions? How many processes did you have? Why didn’t you use threads?

Closing life advice: follow your passion, because the alternative is treating your job as the thing that happens between five and nine. His wife has two computer science degrees and wanted to teach K-12; her parents talked her out of it, and she’s regretted it ever since. He is also, however, not optimistic about computer science as a career bet for today’s 18-year-olds. Healthcare and the building trades, he thinks, are safer ground.

Key Takeaways

  • The relational database won because it let you ignore the storage layer. Codasyl made you follow pointers; IMS made you organize as trees. Codd’s insight was: express what you want, let the system figure out how.
  • Extensible types is the hidden superpower of Postgres. Supporting bond arithmetic, GIS polygons, vectors — all without forking the engine — is why Postgres outlived its cleaner rivals.
  • Query optimizer is the hardest part of a database. Still true fifty years later.
  • Eventual consistency only works when “eventual” is acceptable. For most real businesses — banks, exchanges, inventory — it’s a foot-gun. Google abandoned it with Spanner.
  • Hadoop lost because distributed relational databases were always faster. The 2011 DeWitt/Stonebraker paper proved it; the industry spent a decade relearning.
  • GPUs can’t accelerate indexed lookups. SIMD parallelism dies the moment you need to follow pointers through a B-tree.
  • Amazon’s 15 database products is a mess of their own making. Graph databases, Stonebraker argues, should be a UI layer, not a storage layer.
  • DBOS’s bet: the state of your program belongs in a database. Makes it durable, transactional, crash-safe — all the things agents writing to the real world need.
  • Agentic AI is mostly read-only today but will turn read-write. When it does, it becomes a distributed database problem.
  • Text-to-SQL benchmarks are toy problems. Real warehouses have 100-line queries, messy schemas, and idiosyncratic data. LLMs score 0% on Stonebraker’s Beaver benchmark; 35% even when spoon-fed the joins.
  • The fix isn’t bigger models — it’s decomposition. Break queries into pieces, don’t ask LLMs to join across systems, turn everything into tables and let a real query planner do the join.
  • Stonebraker on hiring: ask deep technical questions about someone’s thesis. If they can’t explain error conditions, process counts, or their design choices, move on.
  • On careers: passion over paycheck. His wife’s regret is a cautionary tale. But also — computer science may not be the obvious bet for new undergrads anymore.

Claude’s Take

Worth the 57 minutes. Stonebraker has the credibility — he actually built the thing — and he’s willing to name names, which most academics won’t. The roasting of Ellison and Google is fun, but it’s not just old-man-yells-at-cloud. The technical substance under each complaint is real. Eventual consistency really did cause a decade of pain. MapReduce really was a detour. The 15-database-products problem at Amazon really is expensive bloat.

The Beaver benchmark finding is the most important thing in the video and deserves more attention than it’s gotten. The honest test of LLMs isn’t Spider’s clean toy schemas — it’s whether you can point one at an actual enterprise warehouse with column names that look like they were typed by a cat walking on a keyboard. Zero percent is a devastating number. Even 35% with heavy hand-holding isn’t useful in production, where the cost of a wrong answer is someone accidentally querying the wrong table and reporting nonsense to the CEO. If you’re evaluating any “AI data analyst” tool — and there are dozens being pitched right now — this is the right frame to critique them.

The DBOS pitch is more speculative. The core insight is genuine: managing state is mostly database work, and programmers currently cobble it together from five different tools (caches, queues, workflow engines, state machines, databases). Folding it into one substrate is elegant. Whether it wins against Temporal, Inngest, and the other durable-workflow shops already carving this market — open question. The “agentic AI needs transactions” argument is the strongest sales pitch for it, and probably correct.

Where I’d push back: Stonebraker’s framing that computer science “may well not be a growth industry going forward” is throwaway grumpy-uncle territory. Healthcare and the building trades over CS is a strange pair to recommend to an 18-year-old who’s any good at math. Take that bit with a large grain of salt.

Score 8/10 because the signal density is high — you leave with a clearer map of what’s actually wrong with a lot of the “modern data stack” narrative, a real benchmark to cite next time someone tells you LLMs have solved text-to-SQL, and a useful default (Postgres until a million TPS) that covers 95% of real-world decisions.

Further Reading

  • “The End of an Architectural Era (It’s Time for a Complete Rewrite)” — Stonebraker et al., 2007. The one-size-fits-none paper.
  • “MapReduce and Parallel DBMSs: Friends or Foes?” — Pavlo, DeWitt, Stonebraker et al., 2010 (CACM). The Hadoop takedown.
  • Readings in Database Systems (the “Red Book”) — Hellerstein and Stonebraker, 5th edition, free online at redbook.io. Canonical entry point for the field.
  • Spanner: Google’s Globally-Distributed Database — Corbett et al., 2012. Google’s concession that eventual consistency doesn’t cut it.
  • BEAVER benchmark paper — the real-world text-to-SQL benchmark where LLMs score 0%. Search “Beaver text-to-SQL Stonebraker” for the 2024/25 paper.
  • Matei Zaharia’s Spark paper — “Resilient Distributed Datasets,” 2012. Background on the DBOS co-founder’s thinking.