Advanced Computing in the Age of AI | Friday, March 29, 2024

No, Operations Isn’t Going Anywhere, But it’s Going to Look Different 

There was a time when developers needed convincing that they should care about operational concerns. Since then, much of the effort of DevOps movement has gone toward getting developers to solve key operational problems — especially deployment — earlier in the lifecycle.

Now the pendulum is swinging back. There is a growing segment of developers who think operations is a thing of the past. They proclaim that today’s developments in cloud native and serverless computing make operations a relic of the past.

The tooling, the tasks, the organizational boundaries, and even the name “operations,” may be changing. However, these assertions about the demise of operations as a distinct craft and professional role are unrealistic and somewhat naive.

Here’s why the expected demise of operations is greatly exaggerated.

1. Abstractions Are Leaky

Practitioner and author Cindy Sridharan (@copyconstruct) first made the connection for me between programming abstractions and operations abstractions. Cindy’s argument starts with Joel Spolsky's famous point that all non-trivial abstractions are leaky. The fact that all abstractions leak means that to expertly use them you need to know the details about what goes on beneath the abstraction.

Cindy points out that operations tools, platforms and automation are abstractions on top of a broad set of technologies. These are non-trivial; these will leak. Operations professionals are the experts who understand the nuances and wrestle with the complexity beneath the abstractions so that their developer colleagues can be productive.

  1. Serverless Doesn't Get Rid of Operations, it Just Makes it Look Different

Building on the theme of leaky abstractions, Patrick Debois' (@patrickdebois) ongoing examination on Twitter of his use of cloud-native services and serverless platforms is eye-opening.

Patrick, who coined the term "DevOps" and kicked off the movement, has been tweeting and speaking about his experiences building and running a significant business using mostly serverless technologies in the cloud. Patrick's journey reinforces the point about abstractions being leaky.

Patrick is required to explore the known and unknown behaviors of the myriad of services he uses. That is operations work. His commentary shows it’s an ongoing and time-consuming job. Also, it is a real-time job, not just a design-time job. Services unexpectedly change, SLAs are broken, performance degrades, things just don't do what you expect them to do. Someone is going to have to respond on-the-fly, triage, and adapt.

Scale this up to enterprise level with hundreds of engineers working on multiple business lines, and you will need dedicated specialists. Those are operations specialists doing operations work.

  1. Deployment Is Just One Part of Operations

Developers have historically held a reductionist view that deployment equals operations. This view is that deployment is the finish line. If there is a problem, then just deploy it again with a different version.

Spend some time in larger enterprises, and you will quickly see that there is a range of necessary day-to-day operations activities beyond code deployments. It’s a long list, including: responding to alerts, investigating performance, capacity planning, responding to ad-hoc business requests, managing caches, managing CDNs, configuring DNS services, managing SSL certs, managing proxies, managing firewalls/networks, running message systems, and more.

Someone has to expertly and coherently manage all of these. Moving to the cloud doesn’t get rid of most of this, it complicates matters by adding external parties. This is operations work, and you are going to need operations specialists.

  1. Legacy Is Inescapable

Legacy is the historical record of an organization's success. The longer you are in business, the more you will accumulate legacy code, platforms, processes, skills, and people.

Enterprises are a web of legacy. Very little lives in isolation. Everything of significance has to hang together at runtime to keep the business going. No matter how well you think you know this complex web of dependencies, the behavior will be unexpected.

Someone has to holistically care and feed for all of these disparate systems comprised of different technologies (from mainframes to serverless in some cases), built from different points of view by different people (often via acquisition) who may not still be around. That is operations work, requiring operations specialists.

  1. AI and Automation Won't Save Us

We can't forget we are dealing with a complex system. An enterprise is two complex systems interacting to form an even more complex system.

One is the complex technical system (interactions between hardware, software, network, user traffic) and the other is a complex system of people working on those underlying technical components.

Industry experts have been doing research around mapping the lessons from managing complex systems in traditional high-consequence domains (aviation, healthcare, manufacturing, etc.) to IT. Their results? Automation has its limits and we need humans to be experts at operating complex systems.

So What’s Changing?

In the end, what is actually happening? The work of operations is changing and the skills required to do that work are changing. The platforms and tools involved are evolving (but don't forget the decades of legacy code that isn't!). Organizational silos are breaking down, and developers and operators are co-mingling as peer engineers. This is an exciting time for everyone in IT. However, let's not get carried away, my friends in development. As a craft and a professional role, operations isn't going anywhere. And you'll be glad it isn't.

Damon Edwards is co-founder of Rundeck.

EnterpriseAI