Old Dragons Remember the First Fire: What Avici Taught Me About AI Infrastructure

Old dragons remember the first fire.

Some of us were there the first time the future showed up too early.

Back in the dot-com era, I worked at Avici Systems. We were building terabit-class routers for a world that was absolutely going to need them. That part was not wrong. The Internet was growing. Traffic was growing. The core was going to get bigger, faster, and less forgiving. The mistake, if you want to call it that, was not the shape of the architecture. The mistake was timing.

We were not wrong about where the world was going. We were early about when it would get there.

That is a different kind of failure. It is also a kind of failure that people who did not live through it often misunderstand.

When people talk about the dot-com bubble now, they tend to flatten it into a cartoon. Bad business plans. Too much venture money. Too much optimism. Too many companies with no revenue and a logo that looked like it was designed by someone drinking Red Bull out of a server rack. Some of that is fair. There was plenty of nonsense. There were companies selling fantasy, companies selling vapor, and companies that existed because capital was cheap and everyone thought the Internet would turn every bad idea into gold.

But that was not the whole story.

Some companies were not wrong. Some companies were building real technology aimed at real problems. They just arrived before the market was ready to pay for the thing at the scale required to survive.

That was one of the great lessons of Avici for me.

We were building for a traffic curve that was real, but not immediate enough. We were building for carrier needs that would become obvious later. We were building systems that made architectural sense, but the operational, financial, and customer demand environment had not fully caught up. In startup terms, that is brutal. You can be technically right and commercially dead. The market does not hand out trophies for being correct eventually. It rewards being correct when the purchase order shows up.

That is the part people miss when they compare today's AI infrastructure buildout to the dot-com bubble. The easy take is, "This looks like 2000." The smarter question is, "Which part looks like 2000?"

Because in 2000, in parts of the networking world, we were building capacity for a future that was coming but not here yet. Today, in AI, the world cannot build enough of the thing it already wants.

That does not mean there is no bubble. There will absolutely be bubble behavior. There will be bad companies. There will be weak business models. There will be rented GPU farms pretending they are strategy. There will be PowerPoint infrastructure companies where the only durable asset is the font choice. There will be software wrappers with no moat, no discipline, and no plan beyond "the model will save us."

Fine. That is how every gold rush works. Real gold attracts real miners, fake miners, shovel salesmen, con men, bankers, drunks, and guys who heard there might be snacks.

But the presence of nonsense does not make the core thing fake.

AI infrastructure today is not being built for an imaginary workload five years away. Training demand is real. Inference demand is real. Memory bandwidth demand is real. Power demand is real. Cooling demand is real. Networking demand is absolutely real. Anyone who has looked seriously at GPU clusters understands that the problem is not just "buy chips." The problem is building an entire system where compute, memory, storage, interconnect, software, scheduling, fault handling, power, and operations all converge without turning the whole thing into an expensive space heater with a dashboard.

That is why Avici keeps coming back to mind.

Not because Avici became NVIDIA. It did not. That is not the claim. Not because modern AI infrastructure is a copy of what we built. It is not. That would be lazy history, and I am not interested in stepping on anyone's toes or flattening twenty-five years of engineering into a cute metaphor.

The point is simpler and stronger.

Some architectural pressures keep coming back because physics does not care what logo is on the chassis.

At Avici, we were dealing with the problem of moving massive amounts of traffic through large systems without the fabric becoming the bottleneck. Big systems become fabric problems. That was true in carrier routing. It is true in hyperscale data centers. It is true in AI clusters. The workload changes. The silicon changes. The packaging changes. The protocols change. The budget owners change. But the ugly truth remains: moving data at scale is often harder than doing the math.

When I look at modern AI infrastructure, especially GPU clusters and the work William "Bill" Dally has done around interconnection networks, parallel systems, and high-performance computing, I do not see Avici reborn. I see a familiar pressure field. I see big fabrics. I see topology mattering. I see locality mattering. I see congestion becoming destiny. I see distributed systems where the clean diagram on the whiteboard gets beaten half to death by tail latency, failure modes, oversubscription, retry behavior, and the fact that buffers are never where you wish they were. That smell is familiar.

Old dragons remember it.

The funny thing about infrastructure is that the ideas do not die cleanly. Companies die. Product names disappear. Chassis get decommissioned. Source code gets archived, lost, or trapped forever inside some corporate archaeology layer. The engineers scatter. But the instincts survive.

The bubble did not kill the dragons. It scattered them. Some went to other startups. Some ended up at Oracle, Apple, AWS, Google, Dell, HPE, and a long list of other places where the next twenty-five years of infrastructure got built. The names on the badges changed. The scars and instincts came with them.

That matters more than people realize. Because architecture is not just what you draw. Architecture is what you remember after production teaches you humility.

A younger engineer can learn a topology. They can learn Clos. They can learn ECMP. They can learn RDMA. They can learn congestion control, telemetry, loss recovery, flow hashing, buffer behavior, failure domains, and all the vocabulary. That is all good. We need that. But there is a difference between knowing the terms and having lived inside a system where the terms became alarms at three in the morning. Scar tissue is a database. It stores the things the whitepaper left out.

It remembers that customers say they want innovation until innovation means operational risk. It remembers that being first sounds heroic until nobody wants to be the first production deployment. It remembers that performance claims are easy, but supportability is where systems go to confess their sins. It remembers that a beautiful architecture with no market timing is still a very expensive way to learn a lesson.

That is why I get impatient with shallow AI bubble talk.

If someone says, "Some AI valuations are insane," I agree.

If someone says, "A lot of AI startups have no moat," I agree.

If someone says, "There is going to be a correction," I agree.

But if someone says, "This is just like 2000," my answer is: define "just like."

Because the Internet was real in 2000. The traffic growth was real. The need for bigger systems was real. What failed was not the entire thesis. What failed was timing, capital discipline, and the number of companies trying to occupy a market before the market had enough room for them. That distinction matters.

The AI buildout has its own timing risk, but it is not the same risk.

The question is not whether demand exists. It does. The question is whether today's supply arrives at the right cost, in the right form, for the workloads that will exist by the time the concrete dries, the power contracts are signed, the GPUs are racked, and the customers are ready to run real production work.

That is the modern knife edge. Build too slowly and you miss the scarcity window. Build too expensively and you need perfect utilization forever. Build around today's training assumptions and tomorrow's inference-heavy world may punish you. Build a GPU cloud without networking discipline and you will learn very quickly that "AI infrastructure" is not just a purchasing department with access to NVIDIA allocation. Build without software, operations, support, security, and customer trust, and all you have is hot silicon and a cooling bill that looks like it was written by a Bond villain.

This is where the old networking lessons matter. AI is not only a compute problem. It is a systems problem. It is a fabric problem. It is a scheduling problem. It is a data movement problem. It's a failure domain problem. It is a power and cooling problem. It is a timing problem. And at scale, it becomes all of those at once, usually on a Friday night when everyone important is on a plane.

That is not new. The names are new. The stakes are higher. The speed is different. The amount of capital is absurd. But the pattern has history. The mistake is thinking history repeats as a costume party. It usually does not. History rhymes through constraints.

The constraint in the Avici era was that the future was visible, but the market was not ready to fully buy it yet. The constraint in AI today is that demand is visible, but supply, power, economics, and operational maturity are struggling to keep up. Those are different conditions. Both can burn companies. Both can reward the few who time it correctly. Both can produce spectacular nonsense along the way.

That is why this moment feels familiar, but not identical.

At Avici, we stood on the sharpest point of the cutting edge. That sounds glamorous until you understand what the sharpest point actually is. It is not a throne. It is a blade. There is no broad market cushion under you. There is no installed base large enough to absorb every mistake. There are customers who admire the technology, test the technology, praise the technology, and then buy more of what they already operate because nobody gets fired for reducing risk. Being right too early is lonely. And expensive.

Twenty-five years later, I look at AI infrastructure and I see a world where some of the old pressures finally have enough demand behind them to become unavoidable. Big systems need big fabrics. Data movement matters. Topology matters. Congestion matters. Control planes matter. Operational discipline matters. The network is not a side quest. It is part of the machine.

That does not make every AI company good. It does not make every capex plan sane. It does not mean the growth curve goes up forever at some ridiculous angle. Nothing does. Trees do not grow to the moon, and neither do GPU clusters, no matter what the deck says. But it does mean the comparison to 2000 needs to be made carefully.

The lesson from Avici is not "never build ahead." Sometimes building ahead is the only way anything important gets built. The lesson is that architecture, market timing, capital, and customer pain all have to converge. If they do not, the future can arrive without you, using ideas you helped prove, carried forward by engineers who survived the first fire.

That is not failure in the simple sense. It is harsher than that. It is being right before being useful.

And if you have lived through that once, you look at AI infrastructure differently. You are less impressed by hype, but also less dismissive of ambition. You know that some of the loudest companies will vanish. You know that some of the dumbest claims will be forgotten. You know that some expensive hardware will end up stranded, mispriced, or badly matched to the workloads that actually survive.

But you also know that the underlying pressure is real. The fire is real. The only question is who can build close enough to it without getting burned.

Old dragons remember the first fire because we have seen the future arrive early before.

This time, the future is not waiting politely at the edge of the market.

This time, the barn is already burning.

Building around real infrastructure pressure?