Advanced Computing in the Age of AI | Thursday, April 18, 2024

SC17 Takeaways: Machine Learning + HPC = ‘Synthesis Modeling’, and Other Developments 

The annual supercomputing confab, SC17, was held last week in Denver, and there were many important announcements. Stories about science, silicon and systems all vied for attention and center stage at this ~12,000 person event. I’m certain I missed a few important bits and bytes, but here’s my observations from what will go down as one of the most impactful HPC events in recent memory. Please forgive the length of this entry; there was just too much good stuff I want to share. I’ve summarized my key takeaways in the Conclusions section, if you are short on time.

Synthesizing Machine Learning with HPC

The real star of any SC event is the advancements in scientific research and understanding that these monster machines enable.  For me, SC17 will be remembered as the year that large HPC projects began to embrace machine learning as a tool, creating synthesis models that combine traditional numerical simulation with the same ML and AI tools that Google and Facebook use in search and advertising targeting.

Solutions using ML require a lot less computational horsepower to estimate the result of a simulation, based on millions of sample simulations, than it is to actually calculate the answer.  It’s worth noting that hyperscale computing, which borrowed heavily from HPC, is now providing technology back to the HPC community, providing the same ML frameworks on the same ultra-fast NVIDIA GPUs that supercomputing centers are using for accelerating science using traditional HPC codes.  Every vendor and every supercomputing center at the event highlighted this trend, an approach that could achieve Exa-Flop (1018 floating-point calculations per second) performance and results long before traditional HPC reaches that milestone. You can see my presentation from the show floor on synthesis modeling here.

In addition to machine learning, I found the multi-media keynote presentation on the Square Kilometre Array (SKA) to be extremely well crafted and presented. This massive radio telescope is so sensitive it will be able to detect airport radar signals from across the Milky Way galaxy, TV signals, if they exist, from the nearest tens maybe 100 stars, and a cell phone from across the planet.  I suspect that if more people took this much care to explain the value of science, we might find ourselves living in a different and better word. Note that the United States has withdrawn support for the SKA. You can watch the keynote here, although it cannot possibly have the full impact without the >100’ screen.

Figure 1: The two leading administrators of the global SKA project, which may some day find ET as well as enable us to peer into the very beginning of time, used an amazing array of screens to tell a story of scientific courage and daring.  Source: Moor Insights

Silicon: AMD, Cavium, Intel and NVIDIA

Intel Cancels Knights Hill; Cavium ARM Wins HPE and Cray Designs; AMD Epyc Surprises with Flops/$; NVIDIA GPU Cloud adds HPC Containers

The week started with a huge miss, as Intel announced that the Xeon Phi product roadmap would no longer include the highly anticipated 10nm Knights Hill (KNH) Xeon Phi multicore processor.  In 2015, Intel won the Argonne National Lab’s $200M Aurora supercomputer procurement as the prime contractor using the KNH processor. Aurora was to become one of the world’s largest supercomputers, and the cancellation of that platform leaves Argonne and others who bought Knights Landing supercomputers, in limbo. Intel says that they will bring out a replacement in 2021, but they have not disclosed what that architecture might look like. I suspect Intel will add on-die accelerators to a future edition of their mainstream Xeon processor, consolidating their product line to reduce costs. It remains unclear how this move may affect the machine learning version of Phi, code named Knights Mill, which was scheduled to start shipments this year.

On the brighter side, ARM-based Cavium ThunderX2, now being acquired by Marvell, landed two premier design wins: Cray and Hewlett-Packard Enterprise, the leading vendor of high end supercomputers and the #1 HPC and server market share leader, respectively. As a battle scarred veteran of the ARM server movement (remember Calxeda and HP Moonshot?), I can finally celebrate this long-awaited milestone in the diversification of the data center. To get a Taiwanese ODM to support your chip is one thing, but to attract these two leading companies to support your device is simply huge. HPE announced support for ARM in the Apollo 70 server, while Cray showed off their 8-node Cavium-based Cray XC50 system, in both liquid and air cooled version, complete with the tried-and-true Aries Interconnect. Perhaps more importantly, Cray now brings its vaunted compilers and development tools to ARM.

The Cray XC50 Supercomputer is based on the Cavium ThunderX2 ARM SOC, with 48 cores each and 8 SOCs per tray. Yes, those are some massive heat sinks! Source: Moor Insights and Strategy

The AMD booth was always jammed with people curious about the EPYC 32-core x86 SOC, which the company claims can deliver up to three times the floating-point price performance of Intel’s pricey Xeon Platinum 8180M1. That’s impressive, but of course nobody pays list price for HPC chips in the supercomputing market. Nonetheless, I believe AMD has a real shot at rebuilding market share, starting in HPC, but it will take time and a compelling roadmap past the first EPYC.

In parallel, Jon Masters, Red Hat’s ARM Architect, was finally able to announce Red Hat Enterprise Linux (RHEL) support for ARM-based SOCs, a goal he has tirelessly championed as Red Hat’s ambassador to the ARM community, spanning seven years and millions of miles.

On the GPU side, NVIDIA announced that Volta GPU’s are 1) now available as a service from every major cloud provider (unlike its Pascal predecessor), 2) is now supported by every major server OEM, and 3) is powering optimized application containers for five HPC popular applications on the NVIDIA GPU Cloud. Scientists can download these pre-configured application stacks to the cloud or their on-premise racks of GPU-equipped servers, simplifying their adoption and deployment for accelerated apps much in the same way that NVIDIA and Amazon Web Services are now doing for machine learning.

Last but not least, IBM showed off their POWER9 chip in a variety of vendor’s booths, highlighting their OpenCAPI interconnect with Xilinx accelerators in a wide range of applications. OpenCAPI hopes to provide an open high performance and low latency, cache coherent interconnects for CPUs, FPGAs, and GPU’s. At the OpenCAPI booth, Mellanox and Xilinx showed off the Innova-2 programmable interface that offloads compute into the network with Xilinx Kintex Ultrascale FPGAs, claiming 6X the performance and 1/10th TCO for data security applications. In fact, Xilinx had a fairly pervasive presence across the exhibition floor, showing off their acceleration capabilities in many application and infrastructure domains.

AMD's booth was jammed on the convention floor of the SC'17 event in Denver. Source: Moor Insights & Strategy

In my opinion, the IBM POWER architecture will need a big win besides the Oakridge Summit supercomputer to remain relevant and perhaps even to survive past the current (new) instantiation. And while Google has indicated they are interested, we still have yet to learn about the internet giant’s plans for POWER in their massive datacenters.

Figure 4: The Xilinx-based Mellanox Innova-2 acceleration PCIe Gen4 and OpenCAPI card enables compute tasks to be offloaded to the network NIC, providing number crunching on the way in or out of the server.  Source: Moor Insights and Strategy

Systems: Dell EMC, HPE and Lenovo

All Three OEMs announce HPC and Machine Learning Strategies, as well as GPU Optimized Servers

The Server Big 3 (Dell EMC, HPE, and Lenovo) outlined their strategies for both HPC and Machine Learning, and announced new servers with support for multiple GPUs. Lenovo was particularly aggressive, proclaiming that they intend to catapult past both Dell and HPE to take the #1 spot in HPC by 2020. If they are able to accomplish this goal, it would be an impressive feat, given HPE and Dell’s large presence, portfolio, and sales organizations targeting this market.

All three companies recognize that their customers need help to adopt machine learning into their datacenters and applications and are offering a bevy of tools, training, and systems designed to ease their path. While most GPUs for machine learning have historically been installed in hyper-scale datacenters, the major server OEM’s are preparing for a time when their enterprise customers need to install hardware on-premises, where their data resides.

The other system that caught my attention was the D-Wave 2000Q, the first quantum computer to support 2000 qubits, along with some interesting deep learning and ecommerce experimental results. Executives at the company believe that commercial applications for quantum computing will begin to materialize when these systems reach the 5,000-10,000 qubit range, and hope achieve that in the next few years.

Conclusions

With so much going on at SC’17, I thought it might help to reprise my key takeaways in “science, silicon and systems” from this momentous event:

  1. Intel now has a 2-3 year gap in HPC-specific silicon, beyond the impressive but standard Xeon Skylake processors, given the cancellation of Knights Hill.
  2. NVIDIA has consolidated its leadership in scientific and machine learning acceleration (every cloud, every vendor, with NVIDIA GPU Cloud greatly simplifying adoption.)
  3. ARM finally has a competitive server part in Cavium’s ThunderX2, with HPE and Cray jumping on board.
  4. AMD EPYC is exciting, but the company and its investors should be patient.
  5. IBM’s impressive POWER9 is here on schedule. Time (and Google) will tell if it matters.
  6. Xilinx is showing up in the datacenter as a programmable acceleration platform in many emerging applications.
  7. Dell EMC, HPE, and Lenovo are doubling down on HPC, with more GPU based servers for HPC and machine learning.
  8. The future may indeed belong to quantum computing, but don’t hold your breath.
  9. Science matters.

Karl Freund is senior analyst, machine learning and HPC, Moor Insights & Strategy.

Disclosure: Moor Insights & Strategy, like all research and analyst firms, provides or has provided research, analysis, advising and/or consulting to many high-tech companies in the industry mentioned in this article, including AMD, Intel, NVIDIA, Dell EMC, HPE, Xilinx and others. The author does not have any investment positions in any of the companies named in this article.

EnterpriseAI