WHY CO-PACKAGED OPTICS FAILED IN 2011 - AND WHY IT WON'T THIS TIME : IBM Europe | Jose Pozo CTO Optica
ELI5/TLDR
Computer chips talk to each other using electricity through copper wires, which is cheap and easy. But as AI gets hungrier, copper can’t move data fast enough over distance, so the industry wants to switch to light (optics) — which is faster but far more fiddly to build. An IBM researcher in Zurich explains that IBM actually tried putting optics right next to the chip back in 2011, and it flopped because the industry found cheaper workarounds. His argument: this time AI is so data-starved there’s no workaround left, so the trick is to stop treating light as a separate add-on and instead bolt it onto the chip in the same single step you already use for the electrical wiring.
The Full Story
What “co-packaged optics” even means
Start with the boring thing that works: an electrical link. Two chips, copper wires between them, maybe a connector. The signal starts as electricity, ends as electricity, and everything in between is plumbing. Simple.
Now light. To send a signal as light you need a laser to make the light, a modulator to flick it on and off in the shape of your data, a special driver to run the modulator, mirror-clean lenses aligned to a hair’s width, plus gadgets to cram many colors of light down one fiber (multiplexers) and pull them apart again at the far end, and amplifiers along the way. Light moves data beautifully. It just drags a whole circus of extra parts behind it.
The basic challenge of optics is that it is complicated.
That one line is the spine of the entire talk. “Co-packaged optics” — CPO — is the idea of moving all that optical circus from the far edge of the board to sit right on the same package as the processor. Think of it like this: instead of the data leaving your house, walking to a bus stop, and only there converting from “spoken word” to “letter mailed in an envelope,” you put the post office inside the front hallway. The closer the light-conversion happens to the chip, the less the signal has to struggle through copper first.
Why the chip suddenly needs this
The speaker, Bert Offrein (IBM Research Zurich), frames it with a 2008 supercomputer called Roadrunner — the first machine to hit one petaflop (a million billion calculations a second). Impressive then. Today it would need 10,000 days — about 27 years — to train one modern neural network. Compute has sprinted ahead.
The problem is that moving data hasn’t kept pace with crunching data. He puts a number on the gap: early systems shifted roughly one byte of communication for every calculation (one byte per flop). Today’s systems manage less than a hundredth of that. The chips can think far faster than they can talk to memory or to each other. That starvation is the whole reason optics is back on the table.
We see a kind of reversal.
Here’s the reversal he means. For years, chipmakers kept cramming everything onto one ever-bigger single chip, because making the chip faster was the easy win. That trend has stalled — single chips can’t keep growing — so the industry is going back to stitching many chips together into one package (multi-chip modules). And the moment you have many separate chips that all need to gossip at full speed, the wiring between them becomes the bottleneck. Optics is the way to widen that pipe.
The ghost of 2011
This is the honest core of the talk, and where it earns its title. IBM didn’t just theorize about CPO — they shipped it. In 2011 they built a supercomputer (he calls it “PERCS”) that was, in his words, the first real co-packaged optics system, though nobody used that name yet. He shows a switch chip with 56 sites, each holding an optical transceiver of 12 channels running at 10 then 25 gigabits per second.
And here’s the assembly nightmare. In electronics you drop a chip on a substrate, drop the substrate on the board, done — thousands of connections made in a couple of steps. With those 56 optical transceivers, you had to build the full optical circus and assemble it 56 separate times, then route every single fiber so it found exactly the right slot at the front of the chassis. Enormous, fiddly, expensive labor.
We thought, wow, this is it. Now optics will be there massively. It didn’t happen.
It didn’t happen because the industry simply found ways around needing optics — clever tricks to live with the slower copper links. The pain of all that hand-assembly outweighed the benefit. The technology worked; the economics didn’t.
Why he thinks it sticks this time
His core thesis: nothing has changed about optics being complicated, so the answer isn’t to wish the complexity away — it’s to swallow the overhead. The overhead is all that extra assembly labor. Kill the labor and CPO survives.
The flagship idea is the polymer waveguide. A waveguide is just a tiny channel that pipes light, the way a copper trace pipes electricity. IBM’s trick is to grow these guides out of a polymer (a flexible plastic-like material) and couple them to the silicon photonics chip by something called adiabatic coupling. “Adiabatic” sounds scary; the picture is gentle. Imagine narrowing a river so smoothly that the water has no choice but to slide sideways into an adjacent channel without splashing or losing energy. They taper the silicon channel down to a fine point, and the light, having nowhere else to go, eases over into the polymer channel — which is sized to match an optical fiber on the other end. Smooth handoff, low loss.
Two practical wins make this more than a lab demo:
First, the polymer guides survive solder reflow — the standard oven-bake step where boards get assembled. Translation: optics can ride along on the same manufacturing line as ordinary electronics instead of needing a precious separate process. He confirms under questioning that they buy these polymer guides from an outside supplier and have passed both solder-reflow and humidity (THB) reliability testing.
Second, and this is the punchline to 2011: he shows a flip-chip attach — the routine way you slap a chip face-down onto a substrate — that makes 100 optical connections in a single step. Recall that 2011 meant assembling optics 56 painful times. The dream here is one stamp, many connections, optical and electrical simultaneously, indistinguishable from how electronics already gets built.
The further-out bets
Two projects sit beyond the near term. With DARPA, IBM wants to rebuild in light what electronics already does in copper: dense 3D routing of optical channels packed as tightly as 3 micrometers apart, with tiny built-in mirrors (“turning elements”) that bend light around corners and even straight up and down through the substrate — vertical optical vias. Early loss numbers are around 1 decibel (still too lossy, they’re working it down). The longer reach is a chip-to-chip optical fabric as seamless as today’s electrical one.
He closes on the part most CPO talks skip: it’s not enough to integrate optics onto the first package. The fibers still have to fan out across the whole board and chassis without re-introducing the 2011 labor. The overhead has to die at every level, not just next to the chip.
The materials-vs-engineering aside
A nice exchange in Q&A. Asked whether the remaining blockers are engineering puzzles or harder material-science puzzles, he refuses the distinction.
For me there’s not a direct discrepancy between engineering and material science. It goes hand in hand.
If you want a reliable product, he argues, you have to understand your materials inside and out — so “just engineer it” was never really on offer.
Key Takeaways
- Co-packaged optics (CPO) = moving the light-generating and light-converting hardware from the board edge to sit directly on the processor’s package, to shorten the slow electrical path.
- Optics is inherently complex: it needs lasers, modulators, drivers, precisely aligned interfaces, multiplexers/demultiplexers, and amplifiers — versus copper’s “electrical in, electrical out.”
- IBM already shipped CPO in 2011 (the PERCS supercomputer) with a switch chip carrying 56 optical transceiver sites, 12 channels each at 10 then 25 Gbit/s.
- It failed commercially not on technology but on assembly economics — building and routing optics 56 times was brutal, and industry found copper workarounds instead.
- Communication has fallen catastrophically behind compute: from ~1 byte moved per flop in early systems to less than 1/100th of a byte today.
- The “reversal”: chips are returning to multi-chip modules (many chips per package) after years of single-chip scaling, which makes inter-chip bandwidth the new bottleneck.
- The proposed fix is killing assembly overhead, not the complexity — via polymer waveguides coupled to silicon photonics by adiabatic (gradual taper) coupling.
- Polymer waveguides are solder-reflow compatible and have passed reflow + humidity (THB) reliability testing; IBM buys them from an external supplier.
- Demonstrated 100 optical connections in a single flip-chip attach — the direct rebuttal to 2011’s one-at-a-time assembly.
- A DARPA project targets 3D optical routing at 3-micrometer pitch with integrated turning mirrors and vertical optical vias; current mirror losses ~1 dB, being improved.
- IBM does the assembly work at its Bromont, Canada facility (~20% IBM-internal, ~80% external clients) and clean-room process work in Zurich and Yorktown Heights.
- IBM Research Zurich is celebrating its 70th anniversary (founded 1956) — IBM’s first lab outside the US.
Claude’s Take
This is a good, honest engineering talk dressed in a slightly clickbait conference title. The substance holds up: the “why it failed” story is unusually candid for an industry pitch — most CPO evangelists pretend the technology has always been on the cusp. Offrein instead says plainly that IBM shipped it 15 years ago and the market shrugged, then identifies the real killer (assembly overhead, not physics). That’s a credible diagnosis, and the byte-per-flop collapse is the cleanest one-number justification for why the industry can’t dodge optics this round.
Where to keep the salt handy: this is an IBM researcher at an Optica industry session, so it’s partly a “come collaborate with us” advertisement (he says so directly). The “100 connections in one attach” and the polymer-reflow compatibility are real, demonstrated wins, but they’re still lab and pilot-line results, not volume manufacturing — and the most exciting part (the DARPA 3D optical fabric) he honestly flags as “further out” with losses still too high. The thesis that this time is different is plausible but not proven; the same demand pressure could again get partially papered over by copper tricks and smarter packaging. The talk is also rough — it’s an auto-transcribed conference recording, so names are mangled (“perks” = PERCS, “vixel” = VCSEL, “tordia/THB” = humidity testing, “semos” = CMOS) and there are no visuals, which costs you a lot since half the argument lives in the slides.
Score 7: genuinely informative, refreshingly self-critical, and a clean mental model of why AI hardware is hitting a communication wall — docked for being a vendor talk, transcript-only without the diagrams, and resting its big claim on results that aren’t yet at scale.
Further Reading
- Roadrunner (IBM, 2008) — the first petaflop supercomputer; useful anchor for how fast compute has outrun communication.
- PERCS / IBM Power 775 — the 2011 system he credits as the first real co-packaged optics deployment.
- Adiabatic coupling in silicon photonics — the gentle-taper light-transfer trick at the heart of the polymer waveguide approach; worth a primer if the “river narrowing” analogy intrigued you.
- DARPA photonics / optical interconnect programs — context for the 3D optical routing and vertical-via work he describes.