Advanced Computing in the Age of AI | Friday, March 29, 2024

Think Before You Swap: Not All File Systems Are Equal in Scaling Cloud Apps 

File Systems are like opinions. Everybody has one (or two). But they may not be the right ones for the job. Unfortunately, a common mistake in development is selection and design of file systems that is not suited to end deployment requirements of an application.

For example, creating a container with a MySQL database doesn’t do you much good if the data disappears when the container restarts. Container-based services that create, update, and delete file-based data need a persistent file system to store that data on. In fact, the entire shift to container-based computing makes selection of proper file system criteria even more critically important.

Because you are increasingly likely to be dealing with heterogeneous compute environments for different parts of the application development lifecycle, a file system that also provides policy and trust becomes essential. By trust, we mean ensuring that workloads running in your application are using the proper resources, operating in the proper geographies, using the approved/validated software versions and attached services.

Earl Ruby of Apcera

Earl Ruby of Apcera

At Apcera, we are building container-based hybrid-cloud management platforms that need to scale quickly and easily. And these systems need to be fully trusted and secure, for large enterprises that may be running massive workloads or testing our microservices. Naturally, file system design is something we spend a lot of time thinking about.

Many developers are familiar with the 12 Factor App (http://12factor.net/) methodology. It is a set of 12 guiding principles for developing cloud-based applications. In theory, developers who follow these principles can create applications that move quickly from development to production environments, are portable across different cloud environments, and scale. The two factors most closely related to file system selection are Factor IV and Factor VI.

Factor IV, Treat backing services as attached resources (http://12factor.net/backing-services), is supposed to simplify the application’s approach to interacting with services, such as databases and file systems. When those services are treated as attached resources, you can swap one resource out for another without changing the application’s code. The example on the 12factor.net site states, “if the app’s database is misbehaving due to a hardware issue, the app’s administrator might spin up a new database server restored from a recent backup. The current production database could be detached, and the new database attached – all without any code changes.”

The basic advice is to “Make sure you can swap out a resource painlessly,” which is a sensible approach from the app perspective, but it doesn’t actually help you scale those attached resources. The expectation is that you can always swap out one resource for another, and that as your system scales up you can easily swap in something that can handle the increased load.

Factor VI, Execute the app as one or more stateless processes (http://12factor.net/processes), goes on to say that “any data that needs to persist must be stored in a stateful backing service, typically a database.” The reason for this is simple: it’s hard to scale a file system with many writers. An app designed to have multiple random writers to the same file will not scale because at some point the contention for that single file will become a bottleneck and the app will need to be redesigned.

However, an app that needs read-only access to a shared file system may be a perfectly reasonable option for a 12-factor app, especially if the size of the data being read is very large. Data ingest systems may need to mount a shared file system. Apps ported from legacy systems may need a shared file system while being re-architected. 12-factor apps that share file system data with other, non-cloud based systems may need to mount a traditional file system. If your application requires a shared, persistent file system, make sure that your platform is flexible enough to support your needs.

Let’s say you determine that a persistent file system resource is a requirement for your new application. Is it reasonable to expect that any file system can scale in any way that may be required? Can you scale a file system regardless of what types of increasing loads you place on it? If you need to swap out one file system resource for another, can you transparently migrate your data to the new resource without taking your app offline -- or at least minimize the maintenance window?

When it comes to file systems, there are a number of things that may need to scale depending on your use case. You may be faced with one or more of the following:

  • Increasing number of bytes to store -- growing from gigabytes to exabytes
  • Increasing number of files stored within a single directory
  • Increasing number of files stored across the entire file system
  • Increasing number of small files -- affects performance of striping, indexing
  • Increasing number of large files -- may impact the distribution of free space
  • Increasing number of files of varying sizes -- performance impacts
  • Increasing or spiking concurrent read accesses
  • Increasing or spiking concurrent write accesses
  • Increasing or spiking number files created per minute
  • Increasing or spiking number of files updated per minute
  • Increasing or spiking number of files deleted per minute
  • Changing group of “highly active” files
  • Changing group of “rarely active” files
  • Growing group of “never active” files

Not every file system solution excels at scaling every one of these factors in a cost-effective manner. There are vendors in the HPC marketspace that will sell you a system that can do all of the above, but you’ll probably pay more for that storage than you can make back in profit from the product you’re building. And in most of these file systems, trust and policy is an afterthought.

Let’s start with functionality and consider factors that can make a system highly available while scale is increasing. You don’t want your site going down because a disk failed or a host went offline, so you will need a certain amount of redundancy in the system. With redundancy come tradeoffs.

Increase the quantity of redundant hardware and you increase your hardware cost.

Reduce the amount of hardware, for instance by using erasure encoding to reduce the number of disks, and you trade off hardware costs for lower performance during recovery and an increased risk of data loss. Recovery time increases when you replace failed hardware since data must be retrieved and recomputed from multiple disks when restoring data. Recovering this data increases both IO and CPU loads, lowering performance during recovery. Data durability goes down -- the risk of some data loss goes up, and a second failure occurring during recovery increases the chance of catastrophic data loss.

Distribute the storage load across many systems and you’ll need to decide what tradeoffs you want to make between consistency, availability, and partition-tolerance.

For these reasons it’s important to know what you need to scale so you can choose a file system that can economically scale the things that are important to your application. The file system you build that requires millions of concurrent reads for a few files (static assets for the front page of a popular web site) is going to look very different than a file system that requires storage of thousands of petabytes of small files which, although rarely accessed, require very low latency retrieval when they are accessed (any photo-sharing site).

And then there is the trust factor. With swapability comes variability. It may be entirely appropriate that the databases bound to a test bed version of a new cluster of services be guardrailed into running only inside a private-cloud environment. Or perhaps it is preferable to run them all in a VPC sandbox in an Amazon Web Services data center for test and development and then only allow them to run in a geographically different data center when they move into production. The point being, as services are attached and removed and reattached to any file system, it’s essential to have policy control over those services if an enterprise is not going to have full transparency into and control over how and where these applications run using what underlying software technologies.

Treating a file system as just another attached resource that can always be scaled up to match your application’s needs is not a realistic assumption. If you want your file system to smoothly scale, make sure that you have a replacement strategy before you start -- How should you migrate your data to minimize or eliminate downtime? How your data is backed up, and what sort of disaster recovery plan you need to implement? Thinking these questions through before you start implementing your application will minimize scaling problems as your application’s popularity grows.

Developers who thoughtfully follow the methodology of the 12 Factor App in selecting and configuring their file system can create applications that move quickly from development to production environments, have the mobility to cross between different cloud environments, and scale -- while maintaining trust.

Earl Ruby is a Principal Software Engineer at Apcera. Currently focused on cloud storage, Earl has also developed reference storage architectures for Seagate, created web-based settlement, network quality of service and fraud-detection software for the wholesale telecom industry, and has helped build wireless telemetry networks for electric, gas, and water utilities at Schlumberger.

 

 

EnterpriseAI