Achieving High Performance Computing without the Supercomputer
When most hear the term "High-Performance Computing" or HPC, what often comes to mind is the image of well-funded research laboratories, think CERN’s Large Hadron Collider, where the sheer processing and calculation power is critical to very specialized and advanced scientific and research missions. Such highly performing computing clusters are often capable of processing tens of trillions of floating point operations per second (FLOPS), which certainly can fulfill the needs of the most advanced and demanding use cases such as high frequency trading, genomic sequencing, computational fluid dynamics, or real-time fraud detections.
But even though this kind of specialized supercomputing is still evolving and some interesting alternatives are emerging with HPC in the cloud, or massive "volunteer-driven" HPC projects whereby large amounts of loosely-coupled volunteer machines execute these "embarrassingly parallel" workload while connected over the web, it still remains out of reach for many organizations due to the very high costs of building and maintaining such high performing grids. Especially for government agencies that are faced with growing requirements for larger-scale data processing and real-time, data-driven execution – without the budget for supercomputing capabilities, -- the looming question becomes, "Is it possible to achieve HPC-like capabilities without additional investment or time-sharing on a supercomputer?" Fortunately, transformative technologies exist to support this emerging class of power-users who seek to perform massive data analytics with high data availability and extremely low latency – at any scale.
Horizontal Scaling through In-Memory Computing
Traditionally, scaling up computing performance can be done in two ways. First, vertically, like through a supercomputer, or second, horizontally, by leveraging many smaller systems in parallel to create a supercomputing cluster. In fact, this latter approach is what Google takes to implement its search engine because it allows them to scale up bigger and faster than a monolithic supercomputer.
The benefit of scaling horizontally for agencies is that it allows them to leverage existing computing resources on commodity hardware or virtualized environments, thereby significantly reducing cost. Right now, many government agencies and enterprise IT departments are experiencing tremendous benefits from big data by leveraging horizontally scaling technologies such as Hadoop. Hadoop has a distributed file system and allows batch execution of jobs in a highly parallelized fashion. Unfortunately, many types of computations do not lend themselves to this approach, creating demand for a different type of horizontal scalability.
The faster alternative to disk-based scaling solutions lies within in-memory computing (IMC) technology. Processors can access data from RAM tens of thousands of times faster than from a local disk drive or distributed file system. And although data sets used in HPC applications are often too large to fit in a local RAM alone, they would fit perfectly in a distributed memory cluster, which is exactly what IMC relies on.
By design, in-memory computing not only takes advantage of clustered processing across multiple servers, but also moves whole data sets to machine memory, both of which can considerably reduce the time it takes to perform data-intensive processing tasks due to reduced reliance on network accessible data sources such as databases or network attached storages (NAS). The most compelling benefit is that IMC allows users to achieve massive scale while relying on their existing computing architectures – essentially easily turbo-charging existing enterprise applications for increasingly demanding data analysis tasks. Agencies that are looking for HPC to address their real-time analytics needs may find that in-memory computing can fulfill many or most of the needs without investing in supercomputing resources.
And for agencies already on the Hadoop path, in-memory computing can still help speed up analysis because rather than relying on Hadoop’s HDFS file system to push data out to nodes and collect results, in-memory technologies can be used to move data around to make it available to map-reduce jobs. This modification can accelerate map-reduce jobs by a factor of 10 or even 100, enhancing Hadoop’s scalability to handle data sizes far beyond today’s supercomputers.
While federal agencies must balance mission requirements with available funding and personnel, one practical system modernization option is to maximize the speed, value, and scale of existing computing resources. For the highest performance in your agency computing architecture, consider in-memory computing, a natural complement to current enterprise investments that delivers greatly expanded capabilities for the undeniable future of big data – processing speed and scalability.
About the Author:
Fabien Sanglier is Chief Solutions Architect of Software AG Government Solutions, a leading software provider for the US federal government, helping it integrate and dramatically enhance the speed and scalability of its IT systems. He can be reached at Fabien.Sa[email protected].