heading · body

YouTube

Juan Alday: Why C++ Wins in Finance

Citadel Securities published 2026-04-28 added 2026-05-01 score 8/10
cpp hft low-latency systems-programming trading-infrastructure citadel
watch on youtube → view transcript

ELI5/TLDR

A Citadel engineer explains why high-frequency trading desks live and die on C++. The whole pitch is one budget: 30 microseconds to receive a market signal, decide, check risk, and fire an order to Chicago — and the speed of light alone eats 20 of those nanoseconds per kilometer of fiber. Miss by 10 nanoseconds and someone else gets the trade. C++ is the only language that lets you control every nanosecond, and the modern versions (C++20, 23, 26) are quietly fixing the parts that used to be miserable.

The Full Story

The 30-microsecond budget

Alday opens with a number — totally made up, he admits — but it’s the number that defines the job. A trading signal comes in. You have 2 microseconds to decide, 5 to check risk, 3 to fire the order. The market is in Chicago, ~6000 km away, and light through fiber takes about 20 nanoseconds per kilometer. That’s your full budget: 30 microseconds.

“If you missed by 10 nanos, someone else gets the fill. You’ll get zilch.”

The point isn’t average speed. It’s the worst case — the 99.9th percentile. Every single time, billions of times a day, you have to hit that budget.

“The choice of computing language is not academic. It’s not an experiment. It is practical, and it has to work all the time.”

Why other languages can’t do this

The villain of the story is the garbage collector — the background janitor that languages like Java or Go run to clean up memory you’re no longer using. Most of the time it’s invisible. But it can pause your program for milliseconds at a moment of its choosing.

“The worst thing that can happen is that at 11, after some economic number release, you get a garbage collector five milliseconds in the middle of a hot path, and you lose millions and millions of dollars.”

Five milliseconds is 5,000 microseconds — 166x your entire budget. C++ has no garbage collector. You manage memory yourself, which is more work, but it never surprises you.

Control all the way down

C++ gives you four kinds of control most languages hide from you:

  1. Memory layout. You can decide which bytes sit next to which bytes, so the CPU’s tiny on-chip memory (cache lines) loads exactly what you need. You can pin work to specific CPU cores (NUMA cores) so data doesn’t have to cross the motherboard.
  2. Direct hardware access. You can talk to the network card (NIC) without asking the operating system for permission. The OS adds a tax — context switches, copies, kernel checks — that you skip entirely.
  3. Predictable code generation. Whatever the compiler produces is what runs, every time. No just-in-time compiler suddenly deciding to “re-evaluate your hot path” mid-trade.
  4. Abstraction without cost. Unlike C, you can write high-level code (templates, classes) that compiles down to the same machine instructions a hand-written version would produce.

The complacent years

C++ won the speed wars and then got stuck in them. For decades, writing fast C++ meant writing ugly C++ — hand-unrolled loops, goto statements, SFINAE (a notoriously cryptic way of writing generic code, where the compiler quietly disables overloads that don’t fit; the acronym stands for “substitution failure is not an error” and the joke is that nothing about it is intuitive).

“We started handcrafting for loops and gotos in our code and loop unrollings… we just had unconstrained templates and we just begged for mercy every time we coded them.”

You wrote how the machine should do something. Loop by loop, byte by byte.

The shift: telling the compiler what, not how

Each new C++ standard takes some piece of manual cleverness and turns it into a language feature.

  • C++20 — concepts. Replace SFINAE incantations with plain English constraints on generic types. The compiler now knows what you meant.
  • C++23 — ranges. Instead of writing a for loop, you describe the data transformation you want. Filter, then map, then sum. The compiler figures out the loop.
  • C++26 — std::execution. The same treatment for asynchronous code. You describe a pipeline of operations, including how to cancel or handle errors, and the compiler wires it up. Errors become part of the type system rather than runtime surprises.

The last frontier: concurrency

Concurrency — running things in parallel without corrupting your data — has been the wild west of C++ for decades. Every team had its own folklore.

“Throw a couple of threads here, a couple of mutexes, a lot of queues… now everything is a lock-free queue.”

Bursty load triggered race conditions. Process restarts at 3 AM revealed deadlocks. The fix was usually to kill the process and start fresh — which, in a trading system, is exactly the moment you didn’t want it dead.

The new model is structured concurrency: pipelines you can pause, cancel, or compose, with the cancellation logic baked in rather than bolted on.

The two-sentence summary

“Stop coding the mechanics and start expressing intent.”

C++ used to demand that you describe every move. The new C++ asks you to describe the goal. The compiler — which, Alday concedes, “does it better than me” — handles the rest.

Key Takeaways

  • Worst-case latency is the only metric that matters in HFT. Average speed is a vanity number; you’re dead at the 99.9th percentile.
  • Light is slow. A round trip to Chicago consumes most of a 30-microsecond budget before any computation happens.
  • No garbage collector is C++‘s killer feature for trading — predictability beats convenience.
  • Kernel bypass (talking directly to the NIC, skipping the OS) is standard practice in HFT, and C++ is one of the few languages that lets you do it cleanly.
  • Modern C++ (20/23/26) is converging on the same idea functional languages had: describe the transformation, not the loop. The performance comes for free because the compiler is now smarter than the human.
  • std::execution in C++26 is the long-awaited fix for concurrency — structured async pipelines with first-class cancellation and errors-as-types.

Claude’s Take

This is a marketing talk for Citadel’s engineering brand, but it’s a good one — short, technically honest, and free of the usual “we’re hiring smart people” filler. The 30-microsecond setup is a teaching device, not a real spec, but it makes the constraint legible to anyone who’s never had to think about nanoseconds.

The argument isn’t that C++ is better than every alternative — it’s that it’s the only mainstream language that lets you opt out of every abstraction you don’t want. Rust is the obvious shadow here; Alday doesn’t mention it, which is a tell. The real answer is probably “C++ has the ecosystem, the compilers, the decades of tuning, and the talent pool already in place” — switching costs more than it saves.

The more interesting half of the talk is the meta-point: C++ has spent a decade quietly becoming a language about expressing intent rather than micromanaging the machine. That arc — concepts replaced template hacks, ranges replaced for-loops, std::execution will replace ad-hoc threading — is the kind of thing you can only see if you’ve been around long enough to have suffered through the old way. 8/10 for being a tight, honest 8 minutes.

Further Reading

  • C++26 std::execution (P2300) — the proposal Alday is referencing for structured async
  • “What Every Programmer Should Know About Memory” by Ulrich Drepper — the canonical text on cache lines and NUMA
  • Solarflare / Onload, DPDK — the actual kernel-bypass networking stacks HFT shops use