The Network as a Program with Nate Foster

ELI5 / TLDR

For decades, building a computer network meant buying a pile of boxes called routers, wiring them together, and then hiring people to log into each box one at a time and hand-tune its settings until traffic flowed the way you wanted. It worked, but it was slow, fragile, and nobody could really predict what the whole thing would do. Nate Foster’s career is about a simple-sounding idea that quietly took over the field: stop poking individual boxes and instead treat the entire network as one big computer program you can write, review, and test like any other software. This conversation walks through how that idea grew up, where it succeeded, where it flopped, and how Foster now uses it to manage Jane Street’s worldwide network.

The Full Story

A failed physicist who fell for the parts that fit together

Foster opens with an origin story that will sound familiar to anyone who switched majors. He started in physics, found himself liking it less with each course, and kept sneaking off to computer science classes that he actually enjoyed. What hooked him was a feeling of coherence — you could trace a program all the way down through a compiler, through the instruction set, through the chip, down to the logic gates, and it all connected.

“I really liked how all the pieces kind of fit together… unlike physics say there’s a lot of creativity, a lot of these abstractions that we have in computing are really designed by people.”

That last point matters for everything that follows. In physics you discover the rules. In computing, humans invent the rules — things like lambda calculus, a bizarre little mathematical system for describing computation that somehow ends up running on real machines. Foster found that beautiful, and the rest of the talk is really about a recurring discovery: that messy, human-built systems often hide a clean mathematical skeleton inside them, if you look hard enough.

Lenses — the warm-up problem

Before networks, Foster spent six years of his PhD on something called lenses. Here’s the problem, stripped down. Imagine you keep your calendar in two formats — one on your phone, one on your laptop — and you want them to stay in sync. To do that, software has to convert format A into format B, and also convert B back into A. Two separate conversion functions. Foster’s advisor noticed these two functions were almost mirror images of each other, so why write them twice and risk them drifting apart?

A lens is the answer: a single object from which you can derive both directions of the conversion. Write it once, get the round trip for free, with mathematical guarantees that changes on one side show up faithfully on the other. The idea caught fire — the Haskell programming community loosened the original definition and now uses lenses “all over the place.” This is the warm-up because it plants the seed of Foster’s whole method: find the one clean abstraction that generates the messy stuff automatically.

The big career swerve, and what “software-defined networking” actually means

After 15 years as a professor, Foster took a leap into networking — a field he knew almost nothing about. He landed in the middle of a revolution called software-defined networking (SDN). To understand it, you need the “before” picture.

A network is built from switches and routers — boxes whose job is to receive a chunk of data (a packet) and shove it out the right wire toward its destination. The old way: you buy boxes from a vendor, the vendor decides most of what they can do, you physically wire them up, and then network engineers carefully configure each box by hand. Change anything and you risk breaking the whole thing in ways nobody can predict.

Two pressures broke this model. Big companies — Google, Amazon — wanted the freedom to change how their networks behaved without begging a vendor. And their networks got enormous, too complex to hand-tune. So the field proposed a new idea:

“The main kind of slogan is the network should just be thought of as like another program.”

Instead of configuring each box separately, you write one program describing how the network as a whole should behave, and that program gets compiled down into settings for all the individual boxes. The payoff is twofold. You get flexibility — you can deploy new behavior at software speed instead of waiting years for a standards committee. And you get reasoning — from a relatively small program you can predict and verify how the whole sprawling system will act, and enforce the properties you care about (these two machines must always be able to talk; these two must never be able to talk).

Foster is blunt about why this works now and didn’t before: it’s mostly economics. The internet was deliberately built so tens of thousands of independent organizations, on different hardware, could interoperate — and that’s exactly why it’s so hard to change. But inside a single company that owns its whole giant network, there’s “one ultimate unit of control” that gets to treat the thing as one program.

The one weird trick: messy domains hide simple math

The intellectual heart of the talk is a pattern Foster keeps hitting. He and his collaborators built a network language called NetKAT. Partway through, they realized it lined up almost perfectly with a decades-old piece of pure math called Kleene algebra with tests — the same kind of theory behind the finite-state machines a second-year CS student learns. They didn’t invent that math; they discovered their practical, weird-looking network language was secretly an instance of it.

“I sort of think of this as like the one weird trick of programming language theory, which is that… a lot of the best ideas in programming languages come from relating the thing you’re doing to very simple mathematical models.”

Why care? Because languages anchored to clean math tend to generalize. Features you add to solve one problem turn out to solve others and compose nicely. When Foster later extended NetKAT to handle probabilities — necessary because real networks have unpredictable traffic, random failures, and randomized load-balancing — the underlying algebra acted as a guardrail. Without it, he says, “we would have very easily ended up with a language that was kind of incoherent.”

There’s a nice deflating moment here too. Foster and collaborators wrote a paper arguing “the network as a program” had arrived. It got rejected — not because it was wrong, but because reviewers said it was too obviously true to be interesting: “your ideas aren’t spicy enough… this is how things work.” The idea won so thoroughly it became invisible.

It’s not just clean abstractions — it’s the whole culture of software

Foster’s interviewer (Ron Minsky) presses a point that’s easy to miss. The deepest win of SDN may not be the elegant semantics at all. It’s that treating a network as software lets you drag in all the ordinary tools and habits of software engineering — version control, code review, testing, a central place where changes get proposed and checked before they go live.

In the old world, an engineer would log into a router and, in Minsky’s phrase, “yolo a change” — make an edit live, on the box, and hope. Plenty of giant real-world outages come from exactly that: a config change with an unexpected effect. The new world lets you catch it the way software teams catch bad code: don’t merge that change. On top of that sits verification — automated tools that take a snapshot of the network and prove properties about it (“these two hosts stay connected no matter what,” “these two stay isolated”). This is now routine at the hyperscalers. Foster is careful not to oversell it; networks are distributed systems with failures and weird interactions nobody modeled, so outages still happen. But one well-solved piece — checking a consistent snapshot of forwarding behavior with a logic solver — “mostly just works.”

Going to a hardware company to stop feeling like a fraud

Foster admits to impostor feelings — the programming-languages academic “cosplaying in networking.” So he spent a sabbatical inside Barefoot Networks, a startup building a fully programmable router chip, to learn how routers really work. He came away in awe of hardware people and with a research direction he’d never have found otherwise: in-network computing.

The insight, from a colleague: if a router chip is programmable enough, it’s “just another kind of processor” — strange, memory-starved, but blisteringly fast. A packet in a data center passes through several of these on its way somewhere. What if each did a little useful work along the way? This is genuinely controversial — there’s a famous old principle (the end-to-end argument) that says keep the network dumb and put smarts at the edges, because features baked into the network make everyone pay for things only some users want. Foster respects the principle but thinks some in-network computing becomes inevitable, and we already see hints of it in the networks feeding modern machine-learning systems.

His broader plea: research communities should celebrate ideas that don’t pan out. Not “couldn’t solve it,” but “solved it, and the world chose a different path anyway.” When a field gets too orthodox, “the world just got a little smaller.”

Multicast: a great idea that failed everywhere except trading

A long, concrete digression on multicast — a technique for efficiently sending the same data to many recipients at once, by laying out a tree through the network and letting switches copy the data down multiple branches in parallel. Decades ago people thought this would be how we’d stream video to everyone. It didn’t happen.

The reason is a subtle imbalance. Sending the actual data turned out to be cheap and abundant (the “data plane”), but the bookkeeping needed to set up and maintain all those distribution trees turned out to be expensive and scarce (the “control plane”). When millions of people want millions of different streams, you simply run out of room for the bookkeeping. So the public internet quietly abandoned multicast; cloud providers barely support it, and when they do it’s a slow fake kept around so ancient software doesn’t break.

But trading is different. Exchanges broadcast market data to everyone consuming it — a small number of channels, not millions. With few channels, the bookkeeping problem vanishes and multicast’s “magic powers” work beautifully. Minsky suspects multicast is broadly underused elsewhere — for distributing data inside systems, for state-machine replication (a core technique for building reliable distributed systems) — simply because the wider world wrote it off.

A maybe-radical future: stop switching packets, schedule them

Foster floats a forward-looking idea from a colleague’s recent paper. Today’s networks are built on packet switching — you just spray packets into the network and shared resources get used up efficiently by the crowd, no central scheduling required. Simple, robust, the default since the 1960s. But supporting multicast on packet-switched routers is hardware-nightmarishly complicated, because many things happen concurrently and the fast middle of a router can’t do everything in parallel when resources are busy.

Machine-learning training workloads, though, are unusually regular and predictable — the same communication patterns repeat. So why switch packets at all? If you know in advance exactly what data moves when, you could build a far simpler, cheaper, faster switch that just executes a precomputed schedule. The honest catch: this only pays off if you could “boil the ocean” and rebuild infrastructure from scratch. But the ML boom is, in effect, building several new oceans — so it’s exactly the moment to try strange things.

Butane: bringing software discipline to Jane Street’s wide-area network

The applied finale. Jane Street’s global network had been managed the “dark ages” way — engineers writing low-level configs for individual BGP routers. BGP (Border Gateway Protocol) is the internet’s routing language: every organization tells its neighbors which destinations it can reach and at what cost, each router compares the advertisements it hears and picks what it thinks is best. It’s a fully distributed system of local decisions, and remarkably it converges to stable, sensible routes — for economic-structural reasons captured by the “Gao-Rexford conditions.” BGP is also commonly used inside big organizations, because it handles dynamism gracefully: links fail, and BGP recovers without central coordination.

Foster’s system, Butane, sits on top. You write a high-level policy — checked into a repository, reviewed and tested like any other code — and it compiles down into BGP snippets for every router. The default just works (traffic gets there somehow); where you care, you can demand certain traffic take the fastest path. Deliberately, the policy language is simple and compiles in a straightforward, legible way — “kind of like OCaml” — rather than being maximally fancy. Foster confesses he found that a little disappointing, then concedes it was the right call: an abstraction you can peel back and understand beats a clever one you can’t.

The surprise was what engineers loved most. Not the elegant abstraction — the tooling. A UI that shows the expected change in latency between sites before you push anything; “what if we lose this link?” exploration. And the deepest shift in how Foster thinks about formal methods:

“The real power is, you know, now we can start to explore. So we can start to take bigger steps and even automate some of the exploration of those steps.”

Formal methods are usually sold as a seatbelt — a way to stop bugs. But because Butane has a tested mathematical model of how BGP behaves, it stops being just a correctness checker and becomes a way to explore the design space with confidence. Already, the team has started using it for capacity planning — deciding which future fiber links to buy — by combining the model with historical traffic. The frontier (still research) is a solver that takes a goal like “make this hotspot disappear with minimal changes” and proposes the configuration itself.

Key Takeaways

The core idea: treat an entire network as a single program you write, review, and test — instead of hand-configuring each switch and router individually. This is “software-defined networking” (SDN).
Two payoffs of SDN: flexibility (deploy new behavior at software speed, no waiting on hardware vendors or standards bodies) and reasoning (predict and verify the whole network’s behavior from a small program).
Why SDN became possible now is economic, not technical: it only works when one entity controls the whole network. The public internet resists it because tens of thousands of independent organizations must interoperate.
A lens is a single abstraction that generates both directions of a data conversion (A→B and B→A) from one definition, with guarantees they stay consistent. Foster’s PhD work; later adopted broadly by the Haskell community.
The “one weird trick” of programming-language theory: the best language ideas come from relating a messy practical domain to a simple mathematical model. NetKAT (a network language) turned out to be an instance of Kleene algebra with tests, a decades-old theory.
Languages anchored to clean math generalize better — features added for one problem tend to solve others and compose cleanly.
The biggest practical win of SDN may be cultural, not semantic: it lets networks adopt version control, code review, and testing. The old way was logging into a router and “yolo-ing” a change live.
Network verification (proving properties on a snapshot of forwarding behavior using logic solvers) is now routine at hyperscalers and “mostly just works” — but networks are distributed systems, so outages from unmodeled interactions persist.
In-network computing: a programmable router is “just another processor” — slow on memory, fast on throughput — so it could do useful work on packets in transit. Controversial because it conflicts with the end-to-end principle (keep the network dumb, put smarts at the edges).
Multicast efficiently sends the same data to many recipients via distribution trees. It failed on the public internet because the data plane is cheap but the control plane (tree bookkeeping) is expensive and doesn’t scale to millions of distinct streams. It works great in trading, where there are few channels.
Packet switching (spray packets, let the crowd use resources efficiently, no scheduling) has been the default since the 1960s. For regular, predictable ML training traffic, a scheduled switch could be simpler, cheaper, and faster — but only if you rebuild infrastructure from scratch.
BGP (Border Gateway Protocol) is a distributed routing protocol: each router advertises reachable destinations to neighbors and picks the best advertisements it hears. It converges to stable routes for economic-structural reasons (the Gao-Rexford conditions) and is used inside organizations too, because it recovers from link failures without central coordination.
Butane (Foster’s Jane Street system) compiles a centralized, reviewed, high-level routing policy down into per-router BGP configs — getting software discipline while keeping BGP’s distributed, failure-tolerant implementation.
A simple, legibly-compiling abstraction beats a clever opaque one. Butane’s policy language was deliberately kept modest so engineers can peel it back and see how it maps to actual configs.
Formal methods reframed: a trusted model isn’t just a bug-catching seatbelt — it lets you explore the design space (what-if analyses, capacity planning, automated change proposals) with confidence.
Healthy research celebrates ideas that don’t pan out — not failures to solve a problem, but solutions the world chose not to adopt. Over-orthodoxy causes stagnation (“ossification” of the internet).

Claude’s Take

This is a strong conversation, and not just because two people who clearly like each other are riffing — it’s structured around one genuinely portable idea (clean math hiding inside messy systems) that gets demonstrated three separate times: lenses, NetKAT, and Butane. That repetition is the point, not padding. If you take nothing else from it, take the “one weird trick” framing, because it generalizes well beyond networking into how good abstractions get found anywhere.

The honesty is the best part. Foster repeatedly admits the disappointing version of the truth — his “network as a program” paper got rejected for being obvious, his in-network computing work is “still very controversial,” his Butane policy language is deliberately less fancy than he’d have liked. A weaker talk would have airbrushed those. Here they’re the most useful bits, because they show the gap between the elegant idea and what actually ships. The multicast digression is the standout: a textbook case of a “great” technology failing for a non-obvious reason (data plane cheap, control plane expensive) and then quietly thriving in the one niche where the constraints flip.

Two caveats keep this at an 8 rather than higher. First, it’s a Jane Street podcast and the back third is partly a tour of Jane Street’s own systems and culture — interesting, but the “look how nice it is here” register creeps in. Second, the conversational format means several deep ideas (Kleene algebra, routing algebras, the Gao-Rexford conditions) get name-dropped and gestured at rather than actually explained; you leave knowing they exist and matter, not how they work. That’s the nature of the medium, not a flaw exactly, but it caps how much you actually learn versus how much you’re pointed toward. For a curious generalist it’s an excellent map of an unfamiliar territory, drawn by someone honest about which roads are paved and which are still dirt.