Let’s Handle 1 Million Requests per Second

ELI5/TLDR

A developer rents increasingly powerful cloud computers to see what it actually takes to handle one million web requests every second. He discovers that databases become absurdly expensive bottlenecks at this scale, that storing data in memory (Redis) instead of on disk is the real-world solution companies like Uber use, and that even Node.js eventually runs out of gas — forcing a rewrite in C++ to cross the finish line. The whole experiment cost him about $2,000 in cloud bills and moved 60 terabytes of data in a single 30-minute test.

The Full Story

The Setup: What Does “One Million” Even Mean?

The video opens with context that makes the number feel real. AWS IAM — the security service that guards every other Amazon service — handles about 400 million requests per second globally. That is the current ceiling of human engineering. One million per second is well within what the biggest companies actually do. Uber does it. Netflix does it. Parts of Apple and Google do it.

The host rents two identical machines on AWS: a “power server” (128 CPU cores, 256 GB RAM, $6/hour) and a “power tester” to generate traffic. He also spins up a beefy Postgres database (64 cores, 256 GB RAM, another $6/hour). The architecture is dead simple — one machine throws punches, the other tries not to fall over.

“A simple mistake here would cost your company tens of thousands of dollars. And a mistake here is not a bug. Having a bug here is unfathomable.”

He means it literally. At this scale, choosing an O(n) algorithm over O(log n) is the mistake. The concept of “code that just works” becomes financially ruinous.

Round 1: The Easy Win

A trivial JSON endpoint — {"message": "hi"} — handles 6 million requests per second on the power server running 128 Node.js instances. Victory lap, except nobody’s production endpoint just returns “hi.”

Round 2: A Real-ish Route (and the Network Wall)

A more realistic PATCH endpoint that generates ~30KB of response data per request immediately hits a different wall: the network card. At 100,000 requests per second, the server is pushing 6 GB/s — the full 50 Gbps capacity of its network interface. The CPU is only half-utilized. The bottleneck is the pipe, not the processor.

“Our main bottleneck here is our network. The network speed is not going to allow us to accept more traffic.”

When he shrinks the response to ~1KB, the same route handles 3 million requests per second. The data itself is the constraint. To push 30KB responses at a million per second, you need 300 Gbps of network — machines that cost $30,000/month.

Round 3: The Database Disaster

This is where things get educational.

Writing to Postgres: 35,000 inserts per second on a $5,000/month database. Quadrupling the disk IOPS (at $1,000/month extra) gets it to 66,000. Still 15x short of one million. The database CPU is pegged at 100%. Scaling up to hit the target would cost an estimated $33,000/month in database alone — and might still not work.

Reading from Postgres with 10 million rows is worse. Three different query strategies are tested:

Version 1 (ORDER BY RANDOM()): O(n) full table scan. Takes 43 seconds for a single read. The database crashes under load.
Version 2 (SELECT COUNT(*)): Also O(n). Also crashes.
Version 3 (max ID lookup + random number): O(1) index lookup. Handles 400,000 reads per second immediately.

“This is why you got to know algorithms if you want to move into such a high stake environment. You make one simple mistake, it could cost you a whole lot down the line.”

The difference between version 1 and version 3 is not a clever optimization. It is the difference between “works” and “database catches fire.” The SQL is almost identical. The cost difference at scale is tens of thousands of dollars per month.

Round 4: Redis to the Rescue

The solution real companies use: stop hitting the database directly. Write to Redis (an in-memory data store), then batch-sync to Postgres in the background.

A single Redis instance handles about 100,000 writes per second — already 3x better than the expensive Postgres setup. But a single instance is single-threaded, so it caps there.

Redis cluster mode — 30 instances spread across 15 masters and 15 replicas, all living in the server’s 256 GB of RAM — crosses the line. One million writes per second. The data stays in memory, gets synced to the real database overnight or via background process.

“This is what Uber and some of these companies with such insane amount of traffic do. For example, if you were getting lots of locations from your drivers… it’d be crazy to try to save all of them to an SQL database.”

The migration script moves 10 million records from Postgres into Redis. It uses 20 GB of RAM. After several rounds of million-per-second writes, the server has 100 GB of data in memory. At this rate, you need to flush to disk fast or you simply run out of RAM.

A side note on UUIDs: the host switches from sequential IDs to crypto.randomUUID() (122-bit) for the ultra-fast route. Via the birthday paradox, at one million IDs per second, it would take 86,000 years to reach a 50% chance of a single collision. Sequential IDs require an extra write to track the counter. At this scale, that extra write matters.

Round 5: Node.js Taps Out, C++ Steps In

For the 30KB PATCH route, even on a “beast” machine (192 cores, 384 GB RAM, 600 Gbps network, $11/hour), Node.js maxes out around 700,000-800,000 requests per second. Express manages barely half a million. The overhead of JavaScript — distributing traffic from a parent process to child workers, JSON parsing, string manipulation — becomes the ceiling.

The host rewrites the server in C++ using Drogon (one of the fastest web frameworks in existence) and RapidJSON (one of the fastest JSON parsers). The default Drogon JSON parser was actually 4x slower than Node’s V8 engine — a detail that took real debugging to discover.

With C++ and RapidJSON: 1 million requests per second, using only 70% of CPU. Node.js needed 100% of CPU to hit 700K.

“C++ is just so powerful with this framework here… whenever you want to have a lot of power and a lot of speed, you would use Drogon. And whenever you want to have development speed, you would use Node.js.”

The Grand Finale: 60 Machines, 30 Minutes, 2 Billion Requests

For the final test, 60 small tester machines (8 cores each) bombard the single beast server for 30 minutes straight. The results:

2 billion total requests handled
60 terabytes of data transferred
40 timeouts out of 2 billion requests
Network throughput: 300 Gbps (the server’s disk reads at 5 GB/s by comparison — the network is 8x faster than the SSD)
CPU idle: 0%

The electricity consumed in one hour would power a Tesla for thousands of kilometers.

An attempt to use AWS Network Load Balancer with two beast servers actually degraded performance. AWS support confirmed the load balancer hit a capacity limit of 165 LCUs — you need to pre-reserve capacity for traffic at this scale. Even AWS’s own infrastructure needs a heads-up when you show up with this much data.

The Cost of It All

Power server: ~$5,000/month
Power tester: ~$5,000/month
Postgres database: $5,000-$7,000/month (and still couldn’t hit 1M writes/sec)
Beast machines: ~$8,000/month each
60 tester machines: ~$20,000/month
Total video production cost: ~$2,000 in AWS bills

For comparison, if you hit the OpenWeatherMap API one million times per second for a month, the bill would be $3.8 billion. Cloudflare Workers at the same rate: ~$1 million/month. Running your own infrastructure is dramatically cheaper at this scale.

Claude’s Take

This is a genuinely well-constructed piece of engineering content disguised as a “let’s break things” video. The progression from trivial endpoint to database to Redis to C++ is pedagogically sound, and each bottleneck is a real-world lesson: network limits before CPU at high data volumes, O(n) vs O(1) as a cost multiplier rather than an academic exercise, the memory-as-database pattern that actual high-traffic companies use.

The strongest parts are the database sections. Watching ORDER BY RANDOM() crash a $5,000/month database while a simple index lookup on the same machine handles 400K reads/sec is a better lesson in algorithmic complexity than most textbooks offer. The cost framing makes it visceral — this is not “your program runs slowly,” it is “your company burns $30,000 a month because of one SQL clause.”

The Redis section is solid and practical. The pattern of write-to-memory, sync-to-disk-later is genuinely how companies like Uber handle location data at scale. The UUID entropy calculation is a nice touch — most developers cargo-cult sequential IDs without understanding why UUIDs are safe.

A few caveats. The claim that Python, Java, and Go “couldn’t handle 1 million requests per second” for the CPU-intensive route deserves scrutiny. Go in particular, with its native concurrency model, should perform much closer to C++ than Node.js for this workload. The bottleneck was likely JSON serialization and the specific framework choices rather than the languages themselves. Java with virtual threads and a fast serializer (Jackson with afterburner, or similar) would also be competitive. The framing implies a hierarchy of languages that is more situational than absolute.

The video’s biggest blind spot is that it treats “one million requests per second on a single machine” as the goal, when in practice, companies distribute this load across many machines behind load balancers in multiple regions. The host acknowledges this briefly but then spends the rest of the video chasing the single-machine milestone. This is fine for entertainment and learning, but it is worth noting that the engineering challenge at Uber is not “one big server” — it is orchestration, consistency, failover, and geographic distribution. Those problems are harder and less photogenic.

The $2,000 total cost is honestly reasonable for the amount of genuine engineering knowledge packed in here. The host is transparent about costs throughout, which is refreshing. Most cloud-scale content hand-waves the money part.