Advanced Computing in the Age of AI | Thursday, March 28, 2024

MIT, Facebook Create Low-Latency Fastpass TCP Replacement 

Data scientists from MIT and Facebook have proposed a centralized datacenter network that chucks the traditional Internet architecture used for packet transmission and path selection, replacing it with a centralized "arbiter" of data transmission that would decide when and along what path to transmit packets.

The centralized "zero-queue" data network architecture dubbed Fastpass blends two fast algorithms. The first determines the time a packet should be transmitted while the second decides what path to use for each packet. The researchers said the architecture relies on an efficient protocol between the network edge and the all-powerful arbiter.

They pondered whether networks could be designed to eliminate queuing delays, resulting in higher utilization of computing resources and creation of a network able to support "multiple resource allocation objectives" between workflows, applications, and users. They contended that such a network design would be especially useful in datacenters where queuing is a primary cause of latency. Moreover, datacenter operators must contend with multiple users and a variety of workloads.

Hence, the researchers tested Fastpass by deploying it in a portion of a Facebook datacenter network. They claimed a 240-fold reduction in queue lengths compared to current networks (4.35 MB reduced to 18 KB) and more consistent workflow throughputs than a baseline Transmission Control Protocol (TCP). They added the arbiter implementation was able to schedule 2.21 Tb/sec of traffic in software running on eight cores. MIT and Facebook also reported a 2.5-times reduction in the number of TCP retransmissions on a Facebook service that was sensitive to latency.

mit-facebook-fastpass

The researchers said Fastpass could be deployed incrementally in a datacenter network. Communication to endpoints outside the new architecture would use Fastpass to reach the boundary where it would be picked up by an external network. While the deployment focused on a single arbiter responsible for all traffic within a deployment boundary, they also contemplate larger deployments.

They acknowledged possible packet loss under their scheme since communication points and the arbiter were not scheduled. Hence, a Fastpass Control Protocol was designed to protect against such loss. Without FCP, an endpoint request or the arbiter's response could be dropped, meaning a corresponding time slot would not be allocated. That means some packets would remain "stuck" in the endpoint's queue.

FCP was implemented as a Linux transport protocol, over the Internet Protocol. A Fastpass queuing discipline feature queued outgoing packets before sending them to a network interface card driver. It also used an FCP socket to send demands to and receive allocations from the arbiter.

The researchers claimed their architecture simultaneously eliminated in-network queuing, achieved high throughput, and supported a range of workflow and application resources allocation objectives. All this was accomplished, they noted, in a "real-world datacenter network."

They also tended to anthropomorphize the Fastpass architecture, for example referring to it as "fair" and exhibiting "fairness" in a research paper to be presented at the SIGCOMM '14 conference.  The researchers' source code is here.

About the author: George Leopold

George Leopold has written about science and technology for more than 30 years, focusing on electronics and aerospace technology. He previously served as executive editor of Electronic Engineering Times. Leopold is the author of "Calculated Risk: The Supersonic Life and Times of Gus Grissom" (Purdue University Press, 2016).

EnterpriseAI