Advanced Computing in the Age of AI | Thursday, April 25, 2024

Microsoft Accelerates Datacenter With FPGAs 

Microsoft researchers who have for several years been migrating its software cloud services onto programmable hardware are reporting an advance in high-end machine learning the company hopes will boost datacenter processing power.

In a white paper released Monday (Feb. 23), Microsoft researchers report on progress in developing what it calls a convolutional neural network (CNN) accelerator. CNN algorithms are frequently used for image classification, image recognition and natural language processing.

The researchers claim theirs runs at three times the throughput/watt of previous CNN accelerator designs based on field programmable gate arrays (FPGAs). Moreover, they claimed that performance is "significantly higher" than designs that use graphics processors.

The effort stems from a project launched by Microsoft in 2011 called Catapult that is intended to leverage FPGAs to boost datacenter performance, reduce power consumption and develop new datacenter capabilities. One early result was an FPGA board that plugs into a Windows server that was eventually released as Open Cloudserver V1.

Catapult is Microsoft's response to the anticipated end of Moore's Law, the bedrock of current silicon semiconductor design, which a Microsoft researcher predicts is less than one product cycle, "perhaps three years," from hitting a wall. Hence, Microsoft processor architect Doug Burger added in a blog post this week, the Catapult platform "would allow cloud service performance to keep improving, once the silicon scaling hits a wall, by migrating successively larger portions from software into programmable hardware."

Microsoft unveiled the Catapult platform last year, demonstrating how an FPGA fabric running more than 1,600 servers in a Microsoft datacenter accelerated Bing web search ranking algorithms. In this configuration, an FPGA board was placed in each server. The FPGAs were then tightly coupled within a rack using a low latency network. Microsoft plans to take the Catapult accelerator to production on Bing later this year.

Microsoft researchers have since been focusing on accelerating other key workloads using reconfigurable logic hardware. The result is the research advance reported this week involving FPGA-based CNN accelerator.

The researchers fashioned a CNN design in reconfigurable logic using chip maker Altera Corp.'s Stratix-V FPGA. Using standard image classification tests, they topped previous FPGA designs by a factor of three. The result "enables our datacenter servers to offer image classification at lower cost and higher energy efficiency than can be provided by medium to high-end GPUs," Burger asserted.

Microsoft said it is now mapping its processing engine to Altera’s new Arria 10 FPGA, which features "hardened support" for floating-point operations while running at more than 1 Tflop with improved energy efficiency.

Altera said Microsoft researchers are using the Arria 10 developer kit and engineering samples of its latest FPGA, which the San Jose-based chipmaker claims is demonstrating datacenter performance up to 40 GFLOPS/watt. Altera said it is using using an open software development tool known as the OpenCL development language to code the Arria 10 FPGA along with its standard hard floating point digital signal processing blocks.

Microsoft is estimating a 70-percent throughput increase using comparable energy. "We thus anticipate that the new Arria 10 parts will enable an even higher level of efficiency and performance for image classification within Microsoft’s datacenter infrastructure," Burger predicted.

Altera said FPGA architectures are well suited to neural algorithms given their flexible data path that enables many OpenCL kernels to pass data directly to each other without using external memory. "Arria 10 has an additional architectural advantage of supporting hard floating point for both multiplication and addition," Michael Strickland, director of Altera's Compute and Storage Business Unit. "This hard floating point enables more logic and a faster clock speed than traditional FPGA products."

EnterpriseAI