Advanced Computing in the Age of AI | Thursday, April 25, 2024

Xilinx Claims FPGA vs. GPU Lead, Unveils Adaptive Acceleration Platform 

Xilinx Versal ACAP

Xilinx, the outspoken champion of FPGA technology, is having an active Open Compute Project Summit this week in Amsterdam. The company has drawn battle lines with that other major force for accelerated server throughput, the GPU, announcing that Xilinx’s new line of Alveo FPGA accelerator cards deliver four times the performance for sub-two-millisecond low-latency applications versus high-end GPUs – and 90X that of CPUs for some database applications.

In addition, Xilinx unveiled Versal Adaptive Compute Acceleration Platform (ACAP), designed to enable developers to accelerate application with optimized hardware and software and to adapt them to keep pace with evolving technology (more below).

Xilinx said the Alveo U200 and U250 – powered by the Xilinx UltraScale+ FPGA – are “designed to dramatically increase performance in industry-standard servers across cloud and on-premise data centers.”

Of course it has to be said – and not to cast aspersions on Xilinx – that when it comes to vendor chip performance claims, it’s never a bad idea to verify them with third parties. Still, Xilinx – the FPGA market leader with an estimated 60 percent share – no doubt offers screaming throughput.

Xilinx said for machine learning, the Alveo U250 increases real-time inferencing by 20X versus high-end CPUs, and they also reduce latency by 3X versus GPUs when running real-time inference applications.

Xilinx Alveo Accelerator Card

The company has assembled an extensive ecosystem of OEMs and partners that have developed applications in AI/ML, video transcoding, data analytics, financial risk modeling, security and genomics, according to Xilinx. Parners include Algo-Logic Systems Inc, Bigstream, BlackLynx Inc., CTAccel, Falcon Computing, Maxeler Technologies, Mipsology, NGCodec, Skreens, SumUp Analytics, Titan IC, Vitesse Data, VYUsync and Xelera Technologies. OEMs are collaborating with Xilinx to qualify multiple server SKUs with Alveo cards, including Dell EMC, Fujitsu Limited and IBM.

"FPGA-based acceleration solutions in modern data centers are gaining popularity as accelerators that can be programmed and reprogrammed easily as users see fit," said Ravi Pendekanti, SVP, product management and marketing, Dell EMC Servers & Infrastructure Systems. "Our collaboration with Xilinx to create best-in-class acceleration solutions will benefit customers in a range of applications from video content streaming to risk management and financial services."

"With 5G use cases for applications such as autonomous driving, telemedicine, and virtual reality, the range of vRAN applications based on the COTS servers is expected to expand considerably in the future," said Mr. Masaki Taniguchi, VP, deputy head of Network Products, Fujitsu Limited.  "Fujitsu Limited and Fujitsu Laboratories Ltd. have been collaborating with Xilinx to jointly validate 3X performance on critical software functions in the 4G vRAN system. Fujitsu looks forward to creating powerful solutions by combining its x86 servers and Xilinx adaptable acceleration boards."

Alveo accelerator cards are available now starting at $8,995, or they can be accessed in the Nimbix public cloud.

As for the newly launched Xilinx Versal ACAP portfolio, we wrote about the ACAP vision in August, noting the observation from industry watcher Patrick Moorhead, founder, Moor Insights & Strategy, who said “this is what the future of computing looks like.”

"We are talking about the ability to do genomic sequencing in a matter of a couple of minutes, versus a couple of days,” he said. “We are talking about data centers being able to program their servers to change workloads depending upon compute demands, like video transcoding, during the day and then image recognition at night. This is significant."

Xilinx said Versal is designed to deliver industry-leading performance, connectivity, bandwidth and integration for high-demand applications. It includes the AI Engine, a new hardware block for low-latency AI inference and supports advanced digital signal processing (DSP) implementations for applications like wireless and radar. The company said it is tightly coupled with the Versal Adaptable Hardware Engines to enable whole application acceleration, meaning that both the hardware and software can be tuned to ensure maximum performance.

Versal is built on TSMC's 7-nanometer FinFET process technology, and the company said the Versal portfolio is the first platform to combine software programmability with domain-specific hardware acceleration.

"With the explosion of AI and big data and the decline of Moore's Law, the industry has reached a critical inflection point,” said Xilinx CEO Victor Peng. “Silicon design cycles can no longer keep up with the pace of innovation. Four years in development, Versal is the industry's first ACAP. We uniquely designed it to enable all types of developers to accelerate their whole application with optimized hardware and software and to instantly adapt both to keep pace with rapidly evolving technology. It is exactly what the industry needs at the exact moment it needs it."

Xilinx said the Versal portfolio debuts with the Versal Prime series and Versal AI Core series, delivering an estimated 8X AI inference performance boost versus industry-leading GPUs, according to the company.

Xilinx Versa ACAP device concept

The Versal AI Core series is optimized AI and workload acceleration for cloud, networking, and autonomous technology. The series has five devices offering 128 to 400 AI Engines and includes dual-core Arm Cortex™-A72 application processors, dual-core Arm Cortex-R5 real-time processors, 256KB of on-chip memory with ECC, more than 1,900 DSP engines optimized for high-precision floating point with low latency. It also incorporates more than 1.9 million system logic cells combined with more than 130Mb of UltraRAM, up to 34Mb of block RAM, and 28Mb of distributed RAM and 32Mb of new Accelerator RAM blocks, which can be accessed from any engine, the company said.

It also includes PCIe Gen4 8-lane and 16-lane, and CCIX host interfaces, power-optimized 32G SerDes, up to 4 integrated DDR4 memory controllers, up to 4 multi-rate Ethernet MACs, 650 high-performance I/Os for MIPI D-PHY, NAND, storage-class memory interfacing and LVDS, plus 78 multiplexed I/Os to connect external components and more than 40 HD I/Os for 3.3V interfacing. Interconnection is achieved by a network-on-chip (NoC) with up to 28 master/slave ports, delivering multi-terabit per-second bandwidth at low latency combined with native software programmability, according to Xilinx.

The Versal Prime series offers mid-range capabilities and is made up of nine devices, each including dual-core Arm® Cortex-A72 application processors, dual-core Arm Cortex-R5 real-time processors, 256KB of on-chip memory with ECC, more than 4,000 DSP engines optimized for high-precision floating point with low latency. It also incorporates more than 2 million system logic cells combined with more than 200Mb of UltraRAM, greater than 90Mb of block RAM, and 30Mb of distributed RAM to support custom memory hierarchies.

The series also includes PCIe Gen4 8-lane and 16-lane, and CCIX host interfaces, power-optimized 32 gigabits-per-second SerDes and mainstream 58 gigabits-per-second PAM4 SerDes, up to six integrated DDR4 memory controllers, up to four multi-rate Ethernet MACs, 700 high-performance I/Os for MIPI D-PHY, NAND, and storage-class memory interfaces and LVDS, plus 78 multiplexed I/Os to connect external components, and greater than 40 HD I/O for 3.3V interfacing.

Xilinx said the Versal portfolio is enabled by a development environment with a software stack including drivers, middleware, libraries and software framework support.

EnterpriseAI