Advanced Computing in the Age of AI | Friday, April 19, 2024

Enterprise Cassandra Gets a Storage Boost 

As enterprises embrace NoSQL and other emerging database technologies, they are encountering familiar teething problems as they attempt to scale out and deal with stress points like soaring costs for hardware like fast media storage.

In the thick of the enterprise shift to NoSQL is the Apache Cassandra database and leading platform vendor DataStax. Network partner Datagres, the file-level storage intelligence specialist, this week rolled out an automated tiered storage package for the DataStax enterprise implementation of Cassandra that leverages fast storage media like flash and SSDs. The goal is to boost database performance that is slowing as data volumes grow.

The Datagres "PerfAccel" platform unveiled on Tuesday (Sept. 22) targets Apache Cassandra I/O performance requirements using a tiered storage approach attached to each DataStax node and designed to monitor performance. A key hurdle for Cassandra users is the sheer number of read files the database must handle. "Knowing which 10 terabytes [to focus on] is today's big challenge," Datagres CEO Ranajit Nevatia noted in an interview.

(Datagres, Palo Alto, Calif., also announced database industry veteran Nevatia's appointment as CEO and president this week.)

The Datagres approach to boosting Cassandra performance relies on a combination of caching and file-level data analytics. Those capabilities are combined to deliver what the company calls "application I/O controls for server-based operational storage intelligence." It works by supplementing existing data disks with small but fast storage media like SSD/flash drives. Cassandra reads a ton of data from disks. Datagres said its approach allows most of the reads to be served from cache, thereby preventing IOPS "saturation" of the underlying data disk.

The I/O analytics functionality is used to help determine optimum allocation of storage resources as well as to help determine the best database design. Those design considerations include the number of nodes required and how big those nodes need to be, Nevatia explained.

The knock on Cassandra is that it remains hard to deploy and is finicky once it is up and running. Those issues are magnified in enterprise deployments. "The reason it is hard to deploy is because everybody is guessing" when trying to scale Cassandra in the enterprise, Nevatia added.

Hence, Datagres is attempting to fill out the Cassandra enterprise infrastructure with a tiered storage tool that can help size Cassandra nodes and configure performance parameters.

At the same time, it attempts to tackle soaring hardware costs related to scaling NoSQL databases. Along with leveraging the advantages of PCIe/NVMe, flash/SSD and HDDs all running on the same server, the Datagres approach is billed as reducing overall infrastructure cost since its PerfAccel software requires fewer hardware resources.

The Datagres software also runs on the same platform on which Cassandra is running, whether bare metal, virtual, on-premise or in the cloud.

Datagres said PerfAccel for DataStax Enterprise is available now. The grid-scale management and acceleration software was released in conjunction with the annual Cassandra Summit hosted this week by DataStax in Santa Clara.

About the author: George Leopold

George Leopold has written about science and technology for more than 30 years, focusing on electronics and aerospace technology. He previously served as executive editor of Electronic Engineering Times. Leopold is the author of "Calculated Risk: The Supersonic Life and Times of Gus Grissom" (Purdue University Press, 2016).

EnterpriseAI