Why next-gen AI scale-up needs CPO
ELI5 / TLDR
Inside an AI data center, thousands of chips need to talk to each other constantly. They do this over copper wires, the same metal in the cables behind your TV. But copper has a hard limit: the faster you push data through it, the shorter the distance it can travel. We are now bumping into that wall. The fix is to send the signals as light through glass fibers instead, and the trick called co-packaged optics (CPO) is about doing that conversion from electricity to light right next to the chip, instead of inches away. Done close enough, it cuts the wasted energy and delay that today’s plug-in optical parts suffer from.
The Full Story
Why copper has a ceiling
Start with the thing nobody questions: copper is everywhere in a computer. It connects the tiny transistors, it runs across the motherboard, and it forms the big “spine” that lets 72 chips inside one Nvidia rack act as a team. Without copper, microchips do not work.
But copper has a quiet flaw. The faster you shove data down a copper wire, the shorter the distance the signal survives before it falls apart. Think of it like shouting. You can yell a short message clearly across a room, but try to yell a fast, complicated paragraph across a football field and it turns to mush. At today’s top AI speeds (200 gigabits per second per lane), copper gives you about two meters. That is it.
“Use copper when you can and optical when you must.”
That phrase is the whole video in one line. Beyond a couple of meters, you have no choice but to switch to light through fiber optics.
Three networks, three jobs
A modern AI server isn’t one network, it’s three stacked on top of each other.
The front-end network is the boring one every server has always had: loading data, logins, user requests.
The scale-up network is the intense one. It wires together all the chips inside a single rack so tightly that dozens of separate GPUs behave almost like one giant GPU. This needs absurd bandwidth and near-zero delay. Nvidia’s NVLink inside the NVL72 rack is the famous example. Because everything sits within a single rack (under two meters), this layer is copper.
The scale-out network connects rack to rack across the whole data center floor. The distances are too big for copper, so this layer has been optical for years. To put the appetite in perspective: scale-out needs roughly 8–10x the bandwidth of the front end, and scale-up needs another 10x on top of that.
The hidden villain: the DSP
Today, the standard way to do optics in a data center is a pluggable transceiver, a little module you plug into the back of a server tray that both sends and receives light. It has four parts: the connector, a chip called a DSP (digital signal processor), a transmitter with the laser, and a receiver sensor.
Here is the surprise. Ask which part burns the most power and you’d guess the laser. It’s a laser. But no, the laser is only about 15% of the energy. The real hog is the DSP, eating up to 60% of the power. And on delay, it’s worse: the DSP causes over 90% of the lag the module adds.
Why does this DSP even exist? Because by the time the electrical signal travels the ~30 cm from the GPU, across the package, over the motherboard, and out to the transceiver, it’s gotten messy. The DSP’s job is to clean and boost that battered signal before turning it into light. Useful, but expensive in both watts and nanoseconds.
What CPO actually is
“The entire reason CPO even exists is to eliminate the need for a DSP in an optical transceiver.”
The idea is simple: move the optical conversion so close to the chip that the signal never has time to degrade, so you don’t need a DSP at all. Several approaches creep toward this:
- LPO (linear pluggable optics): keep the plug-in module but rip out the DSP and just send the slightly-messy signal as light, hoping for the best. It works, but only over shorter distances.
- OBO (on-board optics): move the optics onto the board. A nice idea that combined the worst of both worlds, still needed a DSP and lost the easy swap-out.
- NPO (near-package optics): move the optics right up against the chip. Close enough to start dropping the DSP, and it’s being deployed today.
- CPO (co-packaged optics): the optical engine sits on the same package as the chip. The name says it. At its most advanced (sitting on a shared “interposer,” with future tricks like 3D stacking), it kills not just the DSP but also the SerDes conversion circuitry, the leanest possible setup.
A key nuance: putting CPO next to the networking switch (what Nvidia’s Quantum and Spectrum-X chips do now) is one thing. Putting it next to the GPU itself is a whole different level of ambition.
The catch nobody admits at first
CPO sounds like a clean win. It isn’t, and the holdup is human, not physical.
Pluggable transceivers are loved because they’re pluggable. One fails, a technician swaps it in seconds. There are many suppliers, so prices stay honest and supply never dries up. With CPO, the optics are baked into the hardware: buy Nvidia, you buy Nvidia’s optics; one optical port dies and you replace the whole switch. That is vendor lock-in, and hyperscalers hate it more than they love efficiency.
“Hyperscalers can see the technical benefits, but they also want to avoid a vendor lock-in at all costs.”
So the market splits. Big established cloud players stay cautious and even push NPO-plus-pluggable hybrids. Newer “neoclouds” love the turnkey simplicity and lean into CPO. Because scale-out was already optical, that’s where CPO lands first.
Scale-up is the real prize
Scale-up is harder because copper here is genuinely excellent. Copper needs no signal translation at all, so its latency is tiny, about 10 nanoseconds across two meters, versus the 150–200 ns a DSP adds alone. As long as copper keeps scaling, it wins.
But the wall is close. Today’s 224G copper works using a signaling trick called PAM4. Next-gen 448G will need PAM6 or PAM8, cramming more voltage levels onto the wire, which makes the signal shakier and shrinks copper’s reach below two meters. At some point the noise wins.
This is where CPO’s payoff lands. Blackwell’s huge leap wasn’t just the GPU, it was the NVL72 rack jumping from 8 GPUs acting as one to 72 acting as one. CPO at the GPU level could push that to thousands. At GTC 2026 Jensen announced exactly this: Vera Rubin Ultra NVL576, eight NVL72 racks (8 × 72 = 576) stitched together with a mix of copper inside each rack and optics between them, with Kyber NVL1,152 already on the horizon.
“It might be time to change the principle of use copper when you can and optics when you must to use copper as long as you can because the wall is approaching fast.”
Key Takeaways
- Copper’s reach shrinks as speed rises: ~2 m at 200 Gbps per lane, and that ceiling drops further with each speed jump.
- AI servers run three network tiers: front-end (basic), scale-up (rack-internal, copper, highest bandwidth), scale-out (rack-to-rack, optical).
- Bandwidth ratios: scale-out needs 8–10x the front end; scale-up needs ~10x the scale-out.
- In a pluggable transceiver, the laser uses only ~15% of power; the DSP uses up to 60% and causes 90%+ of the added latency.
- A pluggable transceiver adds ~150–200 ns of latency, almost all from the DSP. Copper adds ~5 ns per meter.
- CPO’s entire purpose is to eliminate the DSP by placing the optical engine on the same package as the chip.
- The progression is LPO → OBO → NPO → CPO, each moving optics closer to the signal source.
- Advanced CPO (shared interposer) also eliminates SerDes; hybrid bonding / 3D stacking could go further.
- The real adoption barrier is vendor lock-in and loss of repairability, not technical performance. A failed optical port means replacing the whole switch.
- Scale-out adopts CPO first because it was already optical; scale-up stays copper until copper’s physical limit is hit.
- Copper signaling uses PAM (pulse amplitude modulation); moving from PAM4 to PAM6/PAM8 raises speed but reduces reach.
- Nvidia keeps copper scale-up for Rubin and Feynman, but Vera Rubin Ultra NVL576 mixes copper (in-rack) with optics (rack-to-rack) to reach a 576-GPU world size; Kyber NVL1,152 follows.
Claude’s Take
This is a genuinely good explainer, the kind SemiAnalysis does well: it takes a topic that sounds like marketing noise (co-packaged optics) and grounds it in a single physical fact, copper falls apart fast at high speed, then builds everything else on top. The pacing is clean, the power/latency breakdown of the DSP is the memorable payload, and the honesty about why hyperscalers are dragging their feet (lock-in, repairability, price control) is more useful than the usual breathless hardware hype.
What keeps it from a 9 or 10 is that it’s a teaser. The video says so itself, this is “a tiny part” of the paid SemiAnalysis deep dive, and you can feel the soft sell at the end. It glosses over the hard engineering of how you actually couple a laser into silicon reliably at scale, which is the part everyone struggles with. The numbers (15%, 60%, 90%, 2 m) are stated as facts without sourcing, though they match what’s broadly reported in the industry, so I’d trust them as directional rather than gospel.
No real BS here, just commercial framing. If you want the mental model for why optics are eating the data center from the outside in, this delivers it in 24 minutes. Score: 8.
Further Reading
- SemiAnalysis’ full CPO deep dive and AI Networking model (referenced in the video; paywalled).
- Nvidia GTC 2026 keynote, for the Vera Rubin Ultra NVL576 and Kyber NVL1,152 announcements.
- Background on PAM4 signaling and SerDes, the electrical-signaling techniques that set copper’s reach limits.
- TSMC SoIC and advanced packaging (interposers, hybrid bonding), the substrate tech that makes tight optical co-packaging possible.