Advanced Computing in the Age of AI | Wednesday, April 24, 2024

Fighting the Cloud Vendor Lock-In Headache 

Vendor lock-in is one of the biggest concerns for enterprises making the move to the public cloud. IT leaders have spent years under the yoke of database giants and hardware vendors, so they are right to be skittish about getting stuck in a single public cloud. But before enterprises begin leveraging multiple public cloud vendors, they need to understand the risks of deploying multiple clouds at a time when IT teams are already struggling to govern existing solutions.

Choosing the right cloud solution must balance vendor lock-in fears with manageability and governance. This strategy can be broken down into three steps: Choose (and perfect) your primary cloud, reimagine the role of central IT and then automate everything.

Step 1: Choose (and perfect) your primary cloud vendor.

Chances are that some of your teams are already experimenting with more than one public cloud vendor. As central IT begins to formalize and expand this usage, it is important that you begin by getting one cloud right -- in production -- before you attempt to use multiple clouds for the same workload or business unit.

The hard part about migrating to any IaaS platform is not the technology itself but the governance models, cost control measures and the processes your systems and development teams use to work together. You need to let your engineers learn your primary cloud platform in its entirety; experiment with various workloads; and learn to manage, audit and control one cloud before you introduce additional complexity and risk by adding a second cloud. Developers and cloud engineers may be able to keep up, but compliance and finance teams will not.

Step 2: Reimagine the role of central IT.

Central IT needs to make the primary cloud vendor selection and direct teams by providing appropriate training and documentation. Usually this takes the form of a security playbook or service catalog, where users can launch resources from pre-formulated templates in which security best practices are embedded. This follows the service-oriented IT model where central IT becomes the service provider or PaaS platform through which IaaS is ingested.

There are two technical features of this model that are important to mention:

  1. Cloud templates with pre-baked security, network and resource parameters, and a software development-like process to store, modify and deploy new templates in a central repository.
  2. A centrally shared services area in your cloud environment that hosts security software, logs, monitoring tools, etc., so that multiple teams have access to the same set of resources.

These two components alone will significantly accelerate each team’s cloud efforts.

Why is central IT the key player in preventing vendor lock-in? It is all about defining standards for automation and governance that transcend the boundaries of platforms or cloud vendors. Getting the platform wrong can cost you one to three months of work. Getting the standards right will serve you for years to come. If you ever need to migrate out of your primary cloud provider you will have a central entity that knows where workloads live and can intelligently recreate practices on a second platform.

If this ship has already sailed and multiple clouds are already in use in production, now is the time for central IT to step in and involve GRC (governance, risk management and compliance) and security teams in producing these central resources. Any work you do on this front will make it easier to manage or migrate in or out of your cloud.

Step 3: Automate everything.

When you implement templatization and configuration management, you build a set of management standards that will serve you in every IaaS platform. You are prioritizing automation over manual work and building clouds with code, principles that make clouds both easy to manage and easy to get out of quickly.

Here is an ideal scenario: You have compute resources that are built by an open-source or cloud native tool from a template. It is just a JSON file that controls virtual resources and could theoretically be configured to build any cloud resource. Then bootstrap those instances with a configuration management tool like Puppet or Chef, which of course you can take with you when you leave, with some alterations.

There is much value in building skills in universal languages like Puppet that can apply to multiple clouds. However, one of the main ways that technology teams attempt, mistakenly, to avoid vendor lock-in is by using “basic” cloud services only and avoiding AWS- or Azure-native tools.

The idea is that if you use your own queuing, notification, networking tools, etc., you can get out of AWS or Google or Azure quickly and easily. But the truth is, you spend the same amount of time -- if not more -- upfront building or re-architecting your own tools. Then you have to manage those tools, update them, and improve them over time. You are better off becoming adept at multiple cloud tools simultaneously, and then moving from one cloud vendor to another as you see fit.

The benefit of the public cloud is not just outsourced compute and storage. It is that public cloud providers like AWS and Google have spent the last 10 years building advanced, scalable infrastructure services—and they release hundreds of updates to these services every year. You do not have to build your own version of these services. You never have to upgrade your cloud to get these new features. You do not have to patch them or version them. They just show up.

All vendors have an agenda. All vendors want you to purchase more of their services. This is true whether you are buying your own hardware in your on-premises datacenter or migrating to the cloud. But do not let fear of vendor lock-in prevent you from getting all the business advantages from public cloud.

EnterpriseAI