Deep learning, the artificial-intelligence technology that powers voice assistants, autonomous cars, and Go champions, relies on complicated “neural network” software arranged in layers. A deep-learning system can live on a single computer, but the biggest ones are spread over thousands of machines wired together into “clusters,” which sometimes live at large data centers, like those operated by Google. In a big cluster, as many as forty-eight pizza-box-size servers slide into a rack as tall as a person; these racks stand in rows, filling buildings the size of warehouses. The neural networks in such systems can tackle daunting problems, but they also face clear challenges. A network spread across a cluster is like a brain that’s been scattered around a room and wired together. Electrons move fast, but, even so, cross-chip communication is slow, and uses extravagant amounts of energy.
Eric Vishria, a general partner at Benchmark, a venture-capital firm in San Francisco, first came to understand this problem in the spring of 2016, while listening to a presentation from a new computer-chip company called Cerebras Systems. Benchmark is known for having made early investments in companies such as Twitter, Uber, and eBay—that is, in software, not hardware. The firm looks at about two hundred startup pitches a year, and invests in maybe one. “We’re in this kissing-a-thousand-frogs kind of game,” Vishria told me. As the presentation started, he had already decided to toss the frog back. “I’m, like, Why did I agree to this? We’re not gonna do a hardware investment,” he recalled thinking. “This is so dumb.”
Andrew Feldman, Cerebras’s co-founder, began his slide deck with a cover slide, then a team slide, catching Vishria’s attention: the talent was impressive. Then Feldman compared two kinds of computer chips. First, he looked at graphics-processing units, or G.P.U.s—chips designed for creating 3-D images. For a variety of reasons, today’s machine-learning systems depend on these graphics chips. Next, he looked at central processing units, or C.P.U.s—the general-purpose chips that do most of the work on a typical computer. “Slide 3 was something along the lines of, ‘G.P.U.s actually suck for deep learning—they just happen to be a hundred times better than C.P.U.s,’ ” Vishria recalled. “And, as soon as he said it, I was, like, facepalm. Of course! Of course!” Cerebras was proposing a new kind of chip—one built not for graphics but for A.I. specifically.
Vishria had grown used to hearing pitches from companies that planned to use deep learning for cybersecurity, medical imaging, chatbots, and other applications. After the Cerebras presentation, he talked with engineers at some of the companies that Benchmark had helped fund, including Zillow, Uber, and Stitch Fix; they told him that they were struggling with A.I. because “training” the neural networks took too long. Google had begun using super-fast “tensor-processing units,” or T.P.U.s—special chips it had designed for artificial intelligence. Vishria knew that a gold rush was under way, and that someone had to build the picks and shovels.
That year, Benchmark and Foundation Capital, another venture-capital company, led a twenty-seven-million-dollar round of investment in Cerebras, which has since raised close to half a billion dollars. Other companies are also making so-called A.I. accelerators; Cerebras’s competitors—Groq, Graphcore, and SambaNova—have raised more than two billion dollars in capital combined. But Cerebras’s approach is unique. Instead of making chips in the usual way—by printing dozens of them onto a large wafer of silicon, cutting them out of the wafer, and then wiring them to one another—the company has made one giant “wafer-scale” chip. A typical computer chip is the size of a fingernail. Cerebras’s is the size of a dinner plate. It is the largest computer chip in the world.
Even competitors find this feat impressive. “It’s all new science,” Nigel Toon, the C.E.O. and co-founder of Graphcore, told me. “It’s an incredible piece of engineering—a tour de force.” At the same time, another engineer I spoke with described it, somewhat defensively, as a science project—bigness for bigness’s sake. Companies have tried to build mega-chips in the past and failed; Cerebras’s plan amounted to a bet that surmounting the engineering challenges would be possible, and worth it. “To be totally honest with you, for me, ignorance was an advantage,” Vishra said. “I don’t know that, if I’d understood how difficult it was going to be to do what they did, I would have had the guts to invest.”
Computers get faster and faster—a remarkable fact that’s easy to take for granted. It’s often explained by means of Moore’s Law: the pattern identified in 1965 by the semiconductor pioneer Gordon Moore, according to which the number of transistors on a chip doubles every year or two. Moore’s Law, of course, isn’t really a law. Engineers work tirelessly to shrink transistors—the on-off switches through which chips function—while also refining each chip’s “architecture,” creating more efficient and powerful designs.
Chip architects had long wondered if a single, large-scale computer chip might be more efficient than a collection of smaller ones, in roughly the same way that a city—with its centralized resources and denser blocks—is more efficient than a suburb. The idea was first tried in the nineteen-sixties, when Texas Instruments made a limited run of chips that were a couple of inches across. But the company’s engineers encountered the problem of yield. Manufacturing defects inevitably imperil a certain number of circuits on any given silicon wafer; if the wafer contains fifty chips, a company can throw out the bad ones and sell the rest. But if each successful chip depends on a wafer’s worth of working circuits, a lot of expensive wafers will get trashed. Texas Instruments figured out workarounds, but the tech—and the demand—wasn’t there yet.
An engineer named Gene Amdahl had another go at the problem in the nineteen-eighties, founding a company called Trilogy Systems. It became the largest startup that Silicon Valley had ever seen, receiving about a quarter of a billion dollars in investment. To solve the yield problem, Trilogy printed redundant components on its chips. The approach improved yield but decreased the chip’s speed. Meanwhile, Trilogy struggled in other ways. Amdahl killed a motorcyclist with his Rolls Royce, leading to legal troubles; the company’s president developed a brain tumor and died; heavy rains delayed construction of the factory, then rusted its air-conditioning system, leading to dust on the chips. Trilogy gave up in 1984. “There just wasn’t an appreciation of how hard it was going to be,” Amdahl’s son told the Times.
If Trilogy’s tech had succeeded, it might now be used for deep learning. Instead, G.P.U.s—chips made for video games—are solving scientific problems at national labs. The repurposing of the G.P.U. for A.I. depends on the fact that neural networks, for all their sophistication, rely upon a lot of multiplication and addition. As the “neurons” in a network activate one another, they amplify or diminish one another’s signals, multiplying them by coefficients called connection weights. An efficient A.I. processor will calculate many activations in parallel; it will group them together as lists of numbers called vectors, or as grids of numbers called matrices, or as higher-dimensional blocks called tensors. Ideally, you want to multiply one matrix or tensor by another in one fell swoop. G.P.U.s are designed to do similar work: calculating the set of shapes that make up a character, say, as it flies through the air.
“Trilogy cast such a long shadow,” Feldman told me recently, “People stopped thinking, and started saying, ‘It’s impossible.’ ” G.P.U. companies—among them Nvidia—seized the opportunity by customizing their chips for deep learning. In 2015, with some of the computer architects with whom he’d co-founded his previous company—SeaMicro, a maker of computer servers, which he’d sold to the chipmaker A.M.D. for three hundred and thirty-four million dollars—Feldman began kicking around ideas for a bigger chip. They worked on the problem for four months, in an office borrowed from a V.C. firm. When they had the outlines of a plausible solution, they spoke to eight firms; received investment from Benchmark, Foundation Capital, and Eclipse; and started hiring.
Cerebras’s first task was to address the manufacturing difficulties that bedevil bigger chips. A chip begins as a cylindrical ingot of crystallized silicon, about a foot across; the ingot gets sliced into circular wafers a fraction of a millimetre thick. Circuits are then “printed” onto the wafer, through a process called photolithography. Chemicals sensitive to ultraviolet light are carefully deposited on the surface in layers; U.V. beams are then projected through detailed stencils called reticles, and the chemicals react, forming circuits.
Typically, the light projected through the reticle covers an area that will become one chip. The wafer then moves over and the light is projected again. After dozens or hundreds of chips are printed, they’re laser-cut from the wafer. “The simplest way to think about it is, your mom rolls out a round sheet of cookie dough,” Feldman, who is an avid cook, said. “She’s got a cookie cutter, and she carefully stamps out cookies.” It’s impossible, because of the laws of physics and optics, to build a bigger cookie cutter. So, Feldman said, “We invented a technique such that you could communicate across that little bit of cookie dough between the two cookies.”
In Cerebras’s printing system—developed in partnership with T.S.M.C., the company that manufactures its chips—the cookies overlap at their edges, so that their wiring lines up. The result is a single, “wafer-scale” chip, copper-colored and square, which is twenty-one centimetres on a side. (The largest G.P.U. is a little less than three centimetres across.) Cerebras produced its first chip, the Wafer-Scale Engine 1, in 2019. The WSE-2, introduced this year, uses denser circuitry, and contains 2.6 trillion transistors collected into eight hundred and fifty thousand processing units, or “cores.” (The top G.P.U.s have a few thousand cores, and most C.P.U.s have fewer than ten.)
Aart de Geus, the chairman and co-C.E.O. of the company Synopsys, asked me, “2.6 trillion transistors is astounding, right?” Synopsys provides some of the software that Cerebras and other chipmakers use to make and verify their chip designs. In designing a chip, de Geus said, an engineer starts with two central questions: “Where does the data come in? Where is it being processed?” When chips were simpler, designers could answer these questions at drafting tables, with pencils in hand; working on today’s far more complex chips, they type code that describes the architecture they want to create, then move on to using visual and coding tools. “Think of seeing a house from the top,” de Geus said. “Is the garage close to the kitchen? Or is it close to the bedroom? You want it close to the kitchen—otherwise, you will have to carry groceries all through the house.” He explained that, having designed the floor plan, “you might describe what happens inside a room using equations.”
Watch 3D Images Leap Out of this Asus Laptop Screen
Motorola’s ThinkPhone Looks Like a Mini Lenovo ThinkPad
The Coolest TVs and Monitors Revealed at CES 2023