Advanced Computing in the Age of AI | Saturday, April 20, 2024

Xeon Phi Coprocessors Accelerate Platform Symphony Grids 

Financial services institutions running risk analytics, Monte Carlo simulations, or pricing algorithms using IBM's Platform Symphony grid software on their clusters are getting a computational boost now that Symphony is able to dispatch work to Intel's Xeon Phi coprocessors.

The support for the Xeon Phi coprocessors with Symphony is in some ways better than that already offered by IBM for several generations of Nvidia Tesla GPU coprocessors, explains Scott Campbell, product manager for Symphony at IBM.

Because the Xeon Phi card is a parallel X86 chip, it can boot up its own Linux kernel and it can be given its own IP address on the network of gridded machines managed by Symphony. Even though a Xeon Phi is physically located inside of a specific server node in a cluster, it can be exposed directly to Symphony, which can dispatch code right to that Xeon Phi card. This is called native mode access.

"What this means from an application perspective is that it looks like a native resource to Symphony and can be managed as such," says Campbell. "Where we are seeing some interest in this is in parent-child relationships. This is where a parent process is running on a standard X86 core but it would be able to spawn thousands of child processes that run on the Xeon Phi cards. So rather than have these child processes consume a tremendous amount of cores on the servers in the grid, you push this work out to the Xeon Phi. And in this case, it is not so much about the execution time, but the resource management time and being able to cut down overall compute node count."

Campbell says that early adopter customers of the Symphony-Xeon Phi combo are writing their code in C and C++, not Java or C#, but over time there will be interest there, too. "I think it is a little early on the application development side, and we are just now learning from customers what types of things they can accelerate," Campbell adds.

Symphony does not require that all machines in a cluster be equipped with Xeon Phi coprocessors – or GPUs for that matter, if enterprises decide to go with those to boost their compute. Servers with particular accelerators can be corralled into resource groups and Symphony is smart enough to dispatch the right kind of applications to the right machines. The Symphony grid software also allows for multiple accelerators to be crammed into a single server node; there is not a requirement for a one-to-one pairing between CPUs and either Xeon Phi or GPU coprocessors.

To use Xeon Phi coprocessors, customers have to upgrade to Symphony 6.1.1, which starts shipping on October 25. The support for coprocessors is not included in the base price and you have to pay a supplemental fee to add it.

IBM did not supply pricing for either the base Symphony license or this add on, and has not publicly discussed pricing since it acquired Platform Computing, the pioneer in grid computing, back in October 2011. Prior to the acquisition, Platform said that Symphony cost $250,000 for a license for a 100-node cluster. It is unclear if IBM has kept the price the same.

GPU coprocessors have been supported since the Symphony 5.2 release, which came out in June 2012. But GPUs are supported in what is called offload mode, which is not as good, from the Symphony perspective, as the native mode available in the Xeon Phi coprocessors.

"We have less control over the application logic executing on the GPUs," says Campbell. "It is basically a CUDA interface that the application is leveraging. We can do monitoring, but we do not control the resource allocation like we can with native mode."

What that means is that Symphony users who want to accelerate applications with GPUs have to do some fussing in those applications to split the code and execute different bits on the CPUs and the GPU coprocessors. For now.

At some point, Nvidia will have a combination of an ARM core and a GPU from its "Project Denver" efforts, and that will allow a Linux kernel to be booted on the GPU. While we know there are going to be hybrid ARM-GPU chips coming out for the Tegra line of chips aimed at smartphones, tablets, PCs, and possibly servers in 2015 or so, it is less clear what Nvidia's plans are for discrete GPU coprocessors in the Tesla line. We know that the "Maxwell" GPU coprocessor due next year will have unified virtual memory, which means the CPU and GPU will be able to see a single address space, making all of the data movement between the two chips unnecessary. We also know that the "Volta" GPU coprocessor expected maybe in 2016 or so will have stacked DRAM memory to ramp up the memory bandwidth on the accelerator. Nvidia has not said if either Maxwell or Volta will have an ARM core embedded on them.

The early users of GPU acceleration on the Symphony grid software have been in the financial services industry, of course. But Campbell says it has not been widely adopted, and certainly not anything along the lines of GPU acceleration in the supercomputing market.

The Symphony grid software comes in four editions. The Developer Edition can be used on two hosts and includes the low-latency grid as well as support for the MapReduce engine that Platform embedded in Symphony with the 5.2 release.

With this MapReduce support, Symphony supports the MapReduce APIs that are part of Hadoop, but actually replaces the JobTracker and TaskTracker components of Hadoop with Symphony, which is a lot faster at allocating resources to MapReduce jobs. Hadoop data can be stored in the Hadoop Distributed File System (HDFS), with Symphony layered on top of it, or HDFS can be swapped out for IBM's own General Parallel File System (GPFS). Generally speaking, Campbell says that using Symphony to schedule MapReduce jobs can boost Hadoop application performance anywhere from 40 percent to as much as three to four times. A lot depends on the nature of the Hadoop algorithms.

The Express Edition adds dynamic resource allocation and scales up to 240 cores, but takes out this MapReduce support, as does the Standard Edition, which scales to 5,000 servers and 40,000 cores and includes desktop, server, and virtual machine harvesting to add compute cycles to the grid. The full-on Advanced Edition has all the bells and whistles, including the MapReduce functions and multi-cluster management; this edition scales up to the same 5,000 servers and 40,000 cores per grid. Symphony can manage up to 300 applications running on the grid at the same time, and can allocate as many as 10,000 cores to a single application; the average latency for transactions running on the grid is under 1 millisecond, which is why financial services companies use it.

Incidentally, the Load Sharing Facility (LSF) workload scheduler developed by the former Platform and aimed predominantly at supercomputing customers, also supports both GPU and Xeon Phi coprocessors. But the support is limited to that offload mode above described above.

By the way, LSF is not just used by academic and government supercomputer labs, despite the original reason for developing it many years ago. Bill McMillan, LSF product manager at IBM, tells EnterpriseTech that credit card companies, warehouse distributors, and retailers have adopted LSF to manage the jobs on their clusters, and significantly, says McMillan, SAS Institute also licenses LSF and uses it underneath some of its analytics applications.

EnterpriseAI