Advanced Computing in the Age of AI | Friday, March 29, 2024

Orchestrating Simulations on a Formula 1 CFD Cluster with Grid Engine 

Sports and advanced scale computing have teamed up for well over a decade, but in possibly no sport does HPC play a more central, competitive role than in the potboiler world of Formula One auto racing. CFD simulations are run on massive clusters throughout F1 race weekends, simulations that seek the slightest aerodynamic edge to eke out an extra scintilla of speed for vehicles moving at up to 220 miles per hour.

Finding ways to go faster matters not only to the outcome of races, which are roughly 190 miles long and are usually won by 10 seconds or less. It also applies to pre-race qualifying sessions, which dictate where cars start races, including the prestigious pole position. In fact, in a legendary 1997 qualifying session known to all F1 “anoraks,” the three fastest cars finished within a thousandth of a second of each other.

With such close margins of victory, HPC mod/sim can make a decisive difference. Add to this the fact that FIA (Fédération Internationale de l'Automobile), the F1 governing body, restricts all 22 Formula One teams to a limited amount of floating point operations of HPC compute per eight week reporting period, placing a premium on the most efficient utilization of teams’ computing resources and focusing compute time on the highest priority, time-critical simulations.

For the Sahara Force India Formula One Team, its 7000-core, 30-TFLOPS AMD-based CFD cluster is about more than running simulations that generate actionable adjustments to its car. It’s also about cluster management so that mod/sim computing is completed on time and within the compute time restrictions allowed by FIA, including documentation of cluster use.

Univa logoFor that, the Sahara team’s 50-person CFD technical team relies on Univa Grid Engine, distributed resource management software designed to orchestrate heterogeneous computing environments, directing compute resources to segments of work they are best suited for while implementing the project priorities established by the HPC managers. Grid Engine software manages workloads automatically and accelerates deployment of applications, clusters, containers and services, both on-premises or in the cloud. In addition, its accounting function produces logs for cluster usage tracking and documentation.

“Our CFD cluster has become a very big computer, a beast,” Gaétan Didier, Sahara’s head of computational fluid dynamics, told EnterpriseTech. “So we have Grid Engine that makes sure it runs reliably to process simulations, to update scripts, as a provisioning tool. The cluster couldn’t run now without Univa Grid Engine, it’s part of the cluster itself. There’s no way you can manually do all the tasks, such as software updates, diagnosing problems, looking at CPU and memory usage. So to optimize your cluster and to run it on a daily basis you need a tool like this. Without it our cluster probably would not be running.”

Univa, founded in 2010 and located near Chicago, has more than 200 enterprise customer accounts. While the company originally found traction in the traditional HPC market, according to CEO Gary Tyreman, Univa’s customer base is now 80 percent commercial across the financial services, life sciences and manufacturing industries. The company’s product legacy stems from Grid Engine software originally developed as an open source project by Sun Microsystems, and later acquired by Oracle. Seeing the need for a more commercial-ready product, in 2011, Univa hired Grid Engine software engineers formerly at Oracle and Sun to harden Univa Grid Engine for emerging distributed and virtual computing environments, including clusters, the cloud and containers. In October 2013, Univa completed the acquisition of all Grid Engine IP from Oracle.

It was the hardening of open source Grid Engine code, along with support services, that convinced Sahara Force India Formula One Team to form a technical partnership with Univa two years ago.

“We had been looking at other entities, such as universities, that were attempting to maintain open source Sun Grid Engine,” said Didier, "but Univa was more serious about getting ahead with the development of this tool to allow us to effectively use Grid Engine to integrate it as part of our CFD process.” He explained that prior to Univa, the Sahara team had merely installed the Grid Engine script but lacked the ability to customize the software for their requirements.

“Before we were using maybe 1 or 2 percent of Grid Engine capabilities,” he said, “but now with Univa we’ve got a way to develop it to suit our needs and a way to understand it because there’s a team that can look inside the code, push it and explain it to us in case there’s something that doesn’t work. It’s a way to have us make the most of this tool. With Univa, it’s like having someone part of the CFD department who is interested in our problems and who can help and give us some ideas on how to set up Grid Engine correctly.”

Bill Bryce, Univa product vice president, said the company’s growth maps to the increasing size and complexity of distributed, shared-resource systems spanning multiple departments. This requires senior IT managers to coordinate their efforts and put in place compute usage policies that ensure priority projects are completed first. Another objective: preventing clever end users from grabbing more than their fair share of compute resources. Time-sensitive, highly competitive industries, such as chip design, have been particularly receptive to Grid Engine.

“On the economic side you can break it down in a pretty straightforward fashion,” Bryce said. “Customers who have time restrictions, like chip design, if you are late to market by a couple of months you may miss your window. If you’re late on a chip that goes into an iPhone, say you’re two months late, that could be the end of your company, game over.”

Univa’s product strategy is to support emerging platforms and frameworks for distributed computing, such as containers. Announced last September, Univa Grid Engine Container Edition incorporates Docker containers into the Univa Grid Engine resource manager. According to Univa, it has the ability to run containers at scale and blend containers with other workloads and support heterogeneous applications and technology environments. In addition, Univa’s Navops is a suite of tools to build, deploy and manage applications across multiple environments, bringing automated installation, enterprise orchestration and scheduling to infrastructures and containers.

Support for hybrid cloud and containers, in which some computing is done on-premises – such as development work and testing – and some in public clouds, represents a major market movement, Bryce explained.

“When they do the big scale out test, or their big deployment, they can push it up into AWS or Google,” he said. “And they recognize that tools like Grid Engine sit in the middle of that because many of these companies use those tools. For us to connect to that environment, to make it transparent for the work to go into these cloud infrastructures in a containerized fashion, that’s where we think things are going in the next five years, and maybe longer.”

EnterpriseAI