Advanced Computing in the Age of AI | Friday, March 29, 2024

Microsoft to Drive RDMA Into Datacenters and Clouds 

Software giant Microsoft has been using InfiniBand networking technology in its Azure cloud for a few years now, and has also been working with Mellanox Technologies to tune up Windows Server 2012 so it can take advantage of the high-bandwidth and low-latency of InfiniBand networks. To demonstrate its commitment to InfiniBand, its Remote Direct Memory Access protocol, and the related RDMA over Converged Ethernet, Microsoft has become a steering committee member of the InfiniBand Trade Association.

Microsoft joins existing steering committee members Cray, Emulex, Hewlett-Packard, IBM, Intel, Mellanox Technologies, and Oracle, all of whom have made big bets on InfiniBand for both server interconnects and for storage clusters.

Like other Internet application and cloud providers, Microsoft has mostly secretive about its use of InfiniBand networking. But every now and then, we get a hint about how Microsoft is deploying it.

For instance, back in 2011, when Microsoft updated the clusters that drive its Bing Maps service, it worked with Dell to build a modular datacenter for a new image processing cluster, which is located in Longmont, Colorado. At the time, this image processing system was built on the Windows Server 2008 operating system and the SQL Server 2008 database. The cluster running the Bing imagery applications was based on custom hyperscale “Nucleon” servers designed by Dell, which were based on the low-powered and low-cost Opteron 4100 chips from AMD. These Nucleon servers had tens of thousands of cores across their nodes to drive the imagery applications, which stitch together maps, satellite imagery, and other media for the Bing Maps service. Dell also said at the time that the Nucleon servers were customized to offer InfiniBand technology, and said further that it had chosen adapter cards and switches from Mellanox. The word from inside Mellanox is that Microsoft chose InfiniBand at this time because the total acquisition cost for InfiniBand switching was considerably lower than for 10 Gb/sec Ethernet networking at the time.

Microsoft has also confirmed that it has InfiniBand switching behind the HPC instances on its Windows Azure compute cloud. And there are persistent rumors that InfiniBand is used in other parts of the Azure public cloud, but nothing that Microsoft will confirm.

As EnterpriseTech has previously reported, Microsoft shipped the R2 update to Windows Server 2012 a month ago, worked with Mellanox specifically to goose the performance of RDMA for InfiniBand and RoCE for Ethernet for both server clustering and for storage servers linking to compute servers running Microsoft’s SMB Direct protocol.

“Increasing numbers of server CPUs sharing a network link, coupled with ever-increasing workload virtualization, are making unprecedented demands on modern datacenter networks. RDMA networking is a key part of the solution,” said Yousef Khalidi, distinguished engineer for Windows Azure, in a statement from Microsoft announcing its joining of the steering committee. “As an active member of the IBTA, Microsoft will help promote RDMA and further drive specifications and standards to enable performance gains and reduce networking overhead on the CPUs in large, mainstream data centers.”

While it is clear that Microsoft wants to drive the adoption of RDMA into enterprise datacenters as well as clouds like Azure and others built using its software stack, Microsoft is not about to emphasize InfiniBand over Ethernet. Both networking technologies can be accelerated by RDMA approaches, and both have their places in the enterprise datacenter.

EnterpriseAI