Advanced Computing in the Age of AI | Thursday, March 28, 2024

Microsoft Joins The Open Source Hardware Movement 

By virtue of its many online applications and its fast-growing Windows Azure compute and storage cloud, Microsoft has become one of the dominant players in the cloud in a relatively short period of time. While Microsoft is a tough competitor in any market it plays in, the company has learned to cooperate when that is the better option. Hence, Microsoft is joining the open source hardware movement by becoming a member of the Open Compute Project.

It might seem odd to see Microsoft at an open source event, and Bill Laing, corporate vice president of cloud and enterprise at the company, joked as much in his opening keynote at the Open Compute Summit in San Francisco. "We are doing this because we want to drive innovation in cloud computing and datacenter design," he said. "And of course, we want to learn from the community."

The Open Compute Project, which was founded by social media giant Facebook nearly three years ago, seeks to bring the open source and sharing philosophy of the Linux operating system and myriad other applications to the design of servers, storage, datacenters, and most recently networking. The idea is to bring together component suppliers, customers, and system integrators to create what is in essence a virtual IT supplier that puts together equipment that competes with that supplied by the incumbent vendors. Hyperscale datacenter operators like Facebook and big financial services companies such as Goldman Sachs and Fidelity Investments have put their weight behind the effort because they want to share the engineering workload and they want that alternative – and presumably less expensive – supply chain. They also want the ability to hack all elements of the stack, including the systems software running on servers, storage, and networks, something that has been sorely lacking especially in the networking space.

To date, Microsoft has been very secretive about the infrastructure that goes into its Windows Azure cloud, and it is by no means divulging the designs of the several large containerized datacenters it has created with the help of Dell and Hewlett-Packard or all of the server and storage gear that is currently in these datacenters.

In a blog post, Laing stole his own thunder from the keynote he gave at the Open Compute Summit on Tuesday, saying that Microsoft was joining Facebook as the only cloud service provider to open up its own server designs. (Amazon Web Services has custom servers, but has not open sourced their designs, and Rackspace Hosting is planning to use slightly modified versions of Facebook servers in its datacenters.)

Bill Laing, corporate vice president of cloud and enterprise at Microsoft

Bill Laing, corporate vice president of cloud and enterprise at Microsoft

Laing also bragged a bit about Microsoft's scale, saying that it started managing its own datacenters in 1989 and launched MSN, its first online service, six years later. Laing said that Microsoft has invested over $15 billion in cloud infrastructure (most of that is to support Microsoft's online applications, not for its infrastructure and platform cloud services sold under the Windows Azure brand) and now has more than 200 cloud services feeding out to 20 million businesses and 1 billion customers around the world. These services include MSN as well as Bing, Office 365, Skype, SkyDrive, Xbox Live, and Windows Azure.

"Simply put, we have learned a tremendous amount building and operating some of the world's largest cloud services," Laing added.

The Microsoft Cloud Server was built to host the Windows Azure public cloud as well as to host the Office 365 and Big search engine services. It looks similar to the tray servers that have become popular among hyperscale datacenter operators these days.

This chassis is 40 percent less expensive than what Laing called "traditional server designs," by which we presume he means general-purpose rack servers from the tier one server makers. That cost savings is not just savings from removing high availability components that are unnecessary because modern software runs over clusters of servers with built in data replication and failover, and Laing said that most of it comes from operational savings. Microsoft has also cut power consumption by 15 percent and has been able to cut deployment and service times in half thanks to the modular tray design. By shifting to this vanity-free and modular design, Microsoft estimates that it can eventually eliminate 10,000 tons of metal and 1,100 miles of network cabling from its datacenters.

In a separate blog post, Kushagra Vaid, general manager of server engineering for the Global Foundation Services unit at Microsoft, which runs its data centers, provided a little more detail on the server and storage design that Microsoft donated to the Open Compute Project.

microsoft-ocp-server-designEach tray in the system can be configured as a compute node or a storage node. The compute node, as you can see from the mechanical drawings, has two processors (presumably Intel Xeons) and four 3.5-inch disk drives. Like many hyperscale operators, Microsoft goes for cheap, fat 3.5-inch SATA drives rather than faster and more expensive 2.5-inch SAS drives. The disk expansion node has room for ten 3.5-inch drives. Each tray is 1U high, and the server chassis has room for a total of 24 nodes in its 12U chassis. Microsoft says the setup uses high-efficiency yet commodity power supplies, and the enclosure uses large fans at the rear of the chassis to more efficiently pull air through the enclosure. In normal 1U rack servers, there are banks of tiny fans, which make a lot of noise because they have to spin faster to move the same amount of air as a larger fan. A larger fan moves air more efficiently than a couple of smaller ones for the same amount of energy, so breaking the fan free from the 1U enclosure is important. Taking out anything from the front of the server node – such as a vendor bezel – also helps improve airflow.

The Microsoft design has a shared signal backplane that the compute and storage nodes plug into as well as a shared power distribution across the chassis and a shared management console. In this regard, the Microsoft setup is more like a blade server than a bare-bones hyperscale design. The server design has a passive backplane, which Microsoft says is simpler and gives better signal integrity between the components that plug into it, and the server and storage nodes have a flexible interconnect architecture that allows for 10 Gb/sec and 40 Gb/sec Ethernet in either copper or optical links to be used. The server and storage nodes are hot pluggable and do not require recabling during servicing, which is a big deal when you have to manage more than 1 million servers.

The chassis also has a homegrown system management controller based on an X86 system-on-chip (Microsoft has not said what one it has used) and a set of homegrown server diagnostic and control software, called Chassis Manager, which Microsoft is also going to open source though its Open Tech organization. This software has been used in the Azure cloud, and Microsoft says it can improve its operational agility by 75 percent compared to tools shipped with general purpose machines. Chassis Manager has the usual command line interface and REST APIs that are expected for modern management tools.

The server design that Microsoft has opened up at the Open Compute Project is similar to but different from another modular design that the company sought patent protection for in December. Microsoft is providing the CAD files for the nodes and the chassis and the Gerber files for the printed circuit boards used in the system.

Here's another neat development: Mark Shaw, director of hardware development for Microsoft's Global Foundation Services organization, has been voted in as chair of the server committee, too.

What Laing did not talk about during his presentation is who would be making the servers that Microsoft itself uses in its datacenters. He did say that three different suppliers have already created designs based on the specification. So if Dell and Hewlett-Packard have built most of the servers to date, they clearly now have some competition.

EnterpriseAI