Hardware Strategies for Horizontal and Vertical Scalability – And Why It Matters
Since almost all NoSQL databases have built-in high availability, it is easy to change hardware configuration which involves shutting down, replacing, or adding a server to a cluster while maintaining the continuous operations.
Seamless, online scalability is one of the key attributes of NoSQL systems which are sharded, shared nothing databases. Users can start with a small cluster and then grow the cluster as throughput and/or storage requirements grow. Similarly, users can also contract the cluster as throughput and/or storage requirements shrink due to decreased workloads. Hence, most NoSQL databases are considered to have horizontal scalability. Figure 1 shows horizontal scalability during different operational workloads.
Scalability is appealing to businesses for a variety of reasons. In many cases, the peak workload is not known in advance of the initial setup of a NoSQL database cluster. Scalability enables the organization to deploy enough capacity to meet the expected workload without unnecessary initial over-expenditure on the hardware. As the workload increases, more hardware can be added to meet the peak demand. As the workload decreases, the hardware can be reclaimed and re-purposed for other applications. The cluster can grow and shrink as the workload changes, with zero downtime.
Most NoSQL systems claim linear or near-linear scalability as the cluster size grows. Linear scalability simply means that when more hardware is added to the cluster, it is expected to deliver the performance that is roughly proportional to its capacities. In other words, if the number of servers in the NoSQL cluster is doubled, it is expected to handle roughly double the workload (throughput and storage) compared to the original configuration, if the additional servers have the same processing and storage capabilities.
With the high availability of NoSQL databases it is easy to replace hardware incrementally, which involves shutting down a machine, removing it from the cluster, bringing in a new server, and adding it to the cluster. Everything continues to work without downtime or interruption of the application.
From a capital expenditure perspective, it can be more cost effective to purchase newer and “bigger” servers because hardware costs decline over time. Newer hardware has more processing and storage capability relative to older generations of hardware at the same price. Replacing older hardware with newer, more powerful and capable hardware gives NoSQL database vertical scalability.
From an operational cost perspective (OPEX) , a cluster with fewer servers is preferable to a cluster with a large number of servers, assuming both options deliver the same performance and storage capabilities. Hardware needs to be actively managed, needs space, cooling and consumes power. Smaller clusters can improve operational efficiency.
Over time, the combined effect of growing the number of servers in a cluster and/or replacing old servers with newer hardware will result in a cluster that has a mix of servers of varying processing and storage capabilities and capacities. It is common to find such “mixed” clusters in production NoSQL applications that have been upgraded over a period of time.
In “mixed” cluster scenarios, it is important to choose a NoSQL solution that can leverage hardware with varying processing and storage capabilities in order to address the application requirements efficiently and effectively. As mentioned, most NoSQL products claim horizontal scalability. But does an administrator really want to maintain a large cluster for NoSQL product X if a smaller cluster running a different NoSQL product will do the job, or exceed the requirements? The example in Figure 2 shows that over time, and with increased storage and processing power, a 12-machine system could be replaced with just a three-machine system.
When evaluating a NoSQL product, it is wise to consider the horizontal as well as vertical scalability characteristics. To deliver the most cost-effective results for modern applications, NoSQL databases should be designed with both horizontal and vertical scalability in mind.
For every server deployed in a cluster, the administrator specifies the capacities (e.g., storage) of the server at the time of the initial setup. The NoSQL database should then use this information to automatically assign workloads so that each server process manages a subset of data that is roughly proportional to the storage capacities of that server. A smaller server can then be assigned less workload compared to a larger server. Similarly, each process uses RAM based on the amount of physical memory available on the machine. Figure 3 shows how a NoSQL Database cluster with mixed hardware capacities would work.
A well-designed NoSQL database will distribute the workload across the available hardware based on the capabilities of the server. More importantly, during the cluster creation and expansion/contraction operations, the system ensures that each replica for every shard (remember, that only some NoSQL databases are sharded and have HA built in), is distributed across different physical servers. This “rule” ensures that the failure of any single server will never result in the loss of a complete shard. Also, in steady-state operation, it is possible that servers might be shut down and restarted at various points in time.
A modern NoSQL Database automatically monitors the health and workload on each server in the cluster and rebalances the workload across the servers in order to avoid “hotspots.” All of this should happen automatically and without any administrative intervention, thus ensuring reliable and predictable performance and high availability over mixed clusters and cluster transitions.
When looking for a NoSQL solution, it is important to look for a solution that leverages hardware effectively in order to meet the performance and availability requirements of the most demanding and innovative applications. When deciding which NoSQL solution to use, vertical scalability should be given as much consideration as horizontal scalability, to ensure the lowest TCO and maximum efficiency.
Tim Goh is principal product manager at Oracle; Michael Schulman is senior principal product manager - NoSQL at Oracle.