Advanced Computing in the Age of AI | Friday, March 29, 2024

Startup Slashes Latencies With Server-Switch Hybrid 

For some workloads, the most important thing is cutting out as much latency as possible from the hardware and software stack. For many of those same latency-sensitive customers, simplifying the infrastructure is equally important. And that is why the Freedom Server-Switch from Pluribus Networks, which has just come out of stealth mode, is going to get plenty of attention.

TIBCO Software has been working for more than a year, behind the scenes, to get its FTL high-performance messaging running on the Freedom Server-Switch and, EnterpriseTech hears, has been testing it at one of its large accounts on a cluster that has more than 1,000 nodes.

The word on the street is that these Freedom machines are also being tested at a high frequency trading firm on Wall Street. Oracle has backed the use of the Freedom hybrids in OpenStack clouds based on its Solaris Unix variant. CloudFlare, one of a number of upstart content delivery networks that is trying to take on industry juggernaut Akamai Technologies and which already accounts for 5 percent of all Internet traffic, has deployed around 50 of the Freedom Server-Switch hybrids in its datacenters. The company did so because the machines have higher bandwidth on the switch dataplane and drop packets less often than standard top-of-rack switches. The plan at CloudFlare is to have another 200 installed within the next year.

Pluribus was founded in April 2010, and the idea was to create a top-of-rack switch with enough processing capacity and a virtualized environment so network applications that normally run on appliances separate from the switch – and out of band with the bits being passed around on networks – could be moved inside of the server-switch hybrid and brought in-band.

This brings to the network stack the kind of programmability that virtualized server environments have had for years, from the base Layer 2 switching and Layer 3 routing up through the Layers 4 through 7 where network applications live.

To create the Freedom Server-Switch, Pluribus started with a big, beefy two-socket Xeon server that is commonly used to run applications. Instead of putting a fast network adapter card on it to reach out to top-of-rack switches, this Freedom hybrid instead has a network switch chip welded right onto the motherboard. By doing so, this cuts out a lot of the middle stuff between a network appliance and the switch, eliminating costs and reducing latencies. By design, the links between the server and switch portions of the device also have more bandwidth than is typically available between network coprocessors and the switch ASICs inside of a modern top-of-rack switch.

Intel has been showing off reference architectures of server-switch hybrids mixing its Xeon CPU and "Alta" FM6000 switch ASICs for the past year as the foundation of network function virtualization, or NFV, platforms. But Pluribus co-founder and CTO Sunay Tripathi tells EnterpriseTech that there is more to it than slapping Linux onto a machine with CPUs and switch ASICs in the same box. The secret sauce for the Freedom machine is called Netvisor, and it is a virtualized environment that unites the serving and switching functions in the machine.

"Just putting a standard Linux on these reference platforms and not really doing anything with the switching side is, I would say, just marketing at best and is just a cost-optimized, switch that does little else that a switch used to do," Tripathi says. "Our take on it is that you need for networking to become a little more powerful, since it is in the middle of everything. You need the NetOps guys to feel that they can use the network applications to do their jobs, not that they have to give something up to save money."

The Freedom Server-Switch hardware looks familiar to anyone who has seen the inside of a standard two-socket Xeon server:

pluribus-freedom-server-switch-1

The Freedom machine comes in a number of different configurations in two basic flavors. The E68 series pairs a single-socket "Ivy Bridge" Xeon E3-1265L processor with a Broadcom "Trident-II" switch ASIC, and it is a standard reference design made by an original design manufacturer (whom Tripathi would not name). It comes in a 1U chassis and has two 120 GB solid state disks for loading up the Netvisor switch operating system.

The other design, the F64 series, puts one or two "Sandy Bridge" Xeon E5-2600 processors in the same system as an Intel Alta FM6400 switch chip (which comes from Intel's acquisition of Fulcrum Microsystems a few years back). Broadcom and Intel are the two biggest suppliers of merchant chips for switches. This machine also has a pair of 120 GB SSDs for storing Netvisor

Both styles of Freedom machines look like a server in the front and have their switch ports on the back. Here are the feeds and speeds of the machines:

pluribus-freedom-table

As you can see, there is only one configuration of the E68, but there are three configurations of the F64, which scale up the processing, memory, and disk or Fusion-io flash capacity as needed. The top-end F64-XL is perhaps the most interesting machine of the lot, with two 1 TB disk drives, two 300 GB SSDs, and an optional four Fusion-io ioDrive2 PCI-based flash cards. (These range in size between 365GB and 3 TB each.)

The Netvisor stack is meant to be able to run on a variety of X86 processors as well as merchant silicon from Intel and Broadcom. It uses the open source KVM hypervisor as a virtualization layer inside the switch. Netvisor has about two million lines of code and is a collection of C libraries, kernel modules, daemons, and network code that can run atop Ubuntu Server or CentOS Linux or the Illumos distribution of OpenSolaris. Through a partnership with Oracle, the official Solaris 11 operating system is also supported on the Freedom iron, and Oracle is a reseller of the machine as well. Pluribus will also, for a fee, test and certify the Netvisor stack to run on a different operating system on its hardware. The KVM slices in turn run network applications. This includes Layer 2 networking code developed by Pluribus and Layer 3 routing, which is based on the open source Quagga project that has been used internally by Google.

Here is how it all stacks up:

pluribus-freedom-server-switch-3

And here, conceptually, is how the Netvisor stack differs from a virtualized network appliance that would be sitting out-of-band from the switch:

pluribus-freedom-server-switch-2

One of the interesting possibilities, of course, is to run actual server applications on the beefier versions of the Freedom Server-Switch hybrid.

"TIBCO is bringing their entire FTL messaging server onto the FM64-XL," says Tripathi. "They want to control the switch and the network, and they always deploy in two to four nodes and they already use Fusion-io for storage. And now, their entire application is running on these appliances."

FTL is used as a backbone for risk analysis and order systems at capital market firms as well as for data feed handlers, just to name a few use cases. The way FTL messaging networks are set up, explains Tripathi, is that there are at least two nodes clustered together in an active-active setup to ensure high availability, and a lot of the time four nodes are clustered together. To scale out the throughput of FTL messaging networks, groups of two or four nodes are scaled out to support a growing workload, he says.

TIBCO is deploying the combination of the Pluribus Freedom hardware and its FTL software as the FTL Message Switch. Denny Page, senior vice president of messaging at TIBCO, explained the reasoning behind the creation of the appliance in a video presentation that EnterpriseTech was able to see before it posts to the Pluribus site.

"Previously, customers would deploy the network infrastructure and the computing infrastructure completely separately, giving them two disparate items to monitor, two disparate items to integrate, often with different teams managing the different solutions," explained Page. "With the Pluribus architecture, we finally have the ability to marry high-performance network processing and high-performance compute together and deploy messaging applications directly into the network infrastructure. This allows us to reduce the number of hops and to improve fault tolerance by having fewer components involved. And it simplifies the administration by allowing the monitoring and management of the network, the compute, and the messaging as a unified whole."

We are hunting down users of the Pluribus Freedom Server-Switch, including TIBCO customers, to see what kinds of advantages they are seeing from the combination of the two.

The base Freedom E68-M machine costs $25,000, and in addition to Layer 2 switching functions it also includes high availability clustering for fabric-wide management across the switches, network monitoring functions, and various other services bundled in. These include an OpenStack cloud controller, Wireshark network protocol analysis, Argus network auditing, plus domain name serving, network address translation, and other services commonly run on servers or network appliances. That is, says Tripathi, what you would pay for a 64 port top-of-rack switch with 10 Gb/sec ports. If you want to go beyond that and add more application muscle to the switches as well as adding Layer 3 routing, then the base FM64-M machine costs $33,000. Adding more hardware and software to the line, the FM64-L costs between $45,000 and $65,000, and the top-end FM64-XL ranges between $60,000 and $80,000. Adding the KVM hypervisor layer costs another $10,000 per switch and is separate from the Netvisor license, which is bundled into the cost of the hardware.

EnterpriseAI