Advanced Computing in the Age of AI | Thursday, March 28, 2024

Intel Mates FPGA With Future Xeon Server Chip 

Intel is taking field programmable gate arrays seriously as a means of accelerating applications and has crafted a hybrid chip that marries an FPGA to a Xeon E5 processor and puts them in the same processor socket.

The hybrid chip has not been given a name yet and should not be confused with the Xeon D series, which was referenced in the announcement today and which is a future Xeon system on chip design that will be based on the "Broadwell-DE" processor. The new hybrid Xeon-FPGA will slide into the same two-socket servers that a regular, general purpose Xeon E5 chip will. Diane Bryant, who is general manager of Intel's Data Center Group, announced the hybrid chip in a blog post and also spoke about it during a session at the GigaOm Structure conference in San Francisco. The Xeon-FPGA chip has been in development for around a year, and while Bryant would not say when it would be released, the word on the street is that it will be available sometime next year.

Intel did not divulge who its FPGA supplier would be, but there are only two choices: Xilinx and Altera are the main suppliers. And the Xeon-FPGA package almost certainly has an FPGA from Altera in the package.

Even before new management took the helm at Intel last year, the chip maker was looking for ways to leverage its substantial lead in chip manufacturing technology, and Intel formed a partnership with Altera, one of the two suppliers of FPGAs, to allow the latter to use Intel as a foundry for its Stratix line of FPGAs. In late March, Intel and Altera extended their partnership to include the development of multi-die, 3D-stacked devices, so the signals were out there that something like the hybrid Xeon-FPGA chip would be coming along. Under the revised deal announced in March between the two companies, the Stratix 10 FPGAs will be etched using Intel's 14 nanometer Tri-Gate processes, which incidentally line up with the processes that Intel will be using its current 22 nanometer Tri-Gate processes to make the "Haswell" line of Xeon E5 processors, which are widely expected to come out before the end of the year, and "Broadwell" Xeon E5 chips will not be expected for at least a year after that – and perhaps more. The point is, on such hybrid devices, the processes used to make the two chips do not have to be the same and they can advance at their own paces.

To combat the growing number of ARM chip suppliers, who are used to rapid development cycles and who are keen on customizing processors and the other elements of system-on-chip designs, Intel has been providing customized versions of its Xeon processors for high volume customers. And Bryant said, in fact, that Facebook and eBay, among others, had requested such modified chips to better run their particular software stacks. When you install servers in lots of 5,000 or 10,000, this is the kind of leverage you get, and with hyperscale customers accounting for perhaps 20 percent of server volumes these days, the top operators who make up the bulk of that volume can get what they want if they are willing to help Intel cover the costs. Intel, of course, does not want to let any ARM or Power alternatives into these accounts and is therefore being a lot more flexible about chip designs than it was in years gone by. Bryant said that Intel made fifteen different customized Xeon chips for these large-scale customers last year, and that it was on track to more than double that count this year.

This is now the way computing will get done, and increasingly the old ways of general purpose hardware and software are falling by the wayside. Or, if you want to be even more precise, most enterprises will be flying coach with a fairly wide variety of SKUs of general purpose processors while the elite with massive server fleets will be able to command processors and coprocessors better tailored to their applications. Moreover, these same companies will soak up much of the software engineering talent because they are solving interesting engineering problems. And the gap between what a normal enterprise can do and what a hyperscale operator or a large enterprise with extreme scale computing can do will grow.

"These are highly technical corporations, as you can image," Bryant explained. "These folks know exactly what they need to accelerate their workloads. This has been an evolving, continuous process from general purpose to system on a chip to fixed function accelerators, which we launched a few years ago, and now to a flexible, customized FPGA."

We would say that this is not so much a line of progression as a fanning out in multiple directions, placing accelerators of many types on the Xeons or next to them. And in the case of the Xeon Phi X86-based parallel accelerator, Intel is moving from a coprocessor that hooked to the Xeon through a PCI-Express 2.0 link to a future "Knights Landing" chip, perhaps available early next year around the same time as the Xeon-FPGA hybrids, that can stand alone as a multicore processor with on-chip interconnect or be used as an accelerator for Xeons.

The Xeon-FPGA hybrids will put the Xeon E5 processor and the unnamed FPGA in a single package. (Both Altera and Xilinx already sell SoC designs that mix ARM cores and their respective FPGAs on a single package.) The Xeon-FPGA hybrid will use a coherent link – presumably one of the QuickPath Interconnect links on the chip that hook multiple processors to each other – that will allow the FPGA to read and write into the cache hierarchy and main memory on the Xeon processor. This proximity is important, because it lowers latency and boosts performance.

"FPGAs can give 10X, 20X, or 30X performance improvements, and by moving it into the Xeon package, it will double the performance again," said Bryant.

The Xeon-FPGA hybrid will be useful in two ways, Bryant said. First, as a means of prototyping and testing what kinds of acceleration features might be useful inside of a Xeon chip. The idea is to grab a Xeon-FPGA combo, create the code to accelerate a function in the FPGA's own Verilog HDL language, and then offload that routine from the code running on the Xeon to the FPGA to show what kind of speedup is possible. At a certain volume of chips, it makes sense to move an accelerated function onto the Xeon die, and this will prove the case. Or, perhaps not. In which case, the accelerated function can simply stay put on the FPGA.

The key thing that hyperscale customers, and their peers in the Global 2000 and the supercomputer labs of the world who have been also been toying with or deploying into production various kinds of accelerators over the years, want is flexibility. They want options and they want to reduce the total cost of ownership of their computing infrastructure, every year, because that is the only way to be profitable as businesses scale up.

Lucky for them, the size of their chip orders gets them special treatment. But this is not only about hyperscale Internet companies, cloud service providers, or telecommunication firms. Everybody is looking to add machine learning to their software stack, and FPGA and GPU accelerators are a necessity here to boost the performance and bring the cost down. Financial services firms have long-since used FPGAs to accelerate relatively static functions, and some upstarts have created entire trading systems on FPGAs. The oil and gas industry is familiar with FPGAs, digital signal processors, and GPUs as accelerators, too. The Xeon-FPGA half-bloods will get some attention from such industries, for sure, provided the performance of the FPGA is acceptable and the cost is not too high.

EnterpriseAI