How many pipelines do I need!?!
7 min read
Modern information systems and applications with fully automated back-ends require a lot of pipelines. Generally speaking, I usually think of at least five categories of pipelines that they need. I define them broadly in this way:
- Application code
- Infrastructure code
This has stood the test of time for me as a solid rubric for judging if an application is well designed for CI/CD or not. If there are not five clear pipelines, below the normal threshold, then you will find pain points and automation targets where they are missing. If there are more pipelines than these five then there may be opportunities to consolidate and simplify. Although more pipelines are often better than not enough. For instance often configuration needs it's own pipeline in highly dynamic systems and systems that have a high level of configuration complexity and/or change.
Let's do a quick definition of an IT pipeline and contrast it to other meta-systems to make sure the context is understood.
Pipeline: A direct channel of information, assembly, development and/or resources that supplies an end state entity.
The characteristics of a IT pipeline are different from say a more traditional assembly line. In a pipeline there is a concept of constant, untended flow. Once the pipeline is set it runs non-stop by default, you may put in stops and stages for control, but these are additions to the default state, you should be able to remove them and the pipeline runs just fine from beginning to end without them. In an assembly line by contrast, their are manually run stages(work stations), that by definition are associated with an operator(assembler) who controls the rate of production passed to the next stage. In a simplified understanding of the assembly line, work proceeds at a maximum velocity of the slowest stage potentially anywhere in the assembly line which is going to be either an automated factor or a human factor. In a simplified view of a pipeline the work proceeds at a maximum velocity of the end bandwidth of the pipeline which is as wide open as possible in a default state.
So I separate my concept of the pipelines in logical and practical delineation like so:
Application Code Pipeline
Application = Custom or modified code that must be validated for quality/security and compiled, or assembled, or packaged for execution.
Pipeline = The pipeline begins at the version control system and ends at the client or server where it is executed. Such that these are what is commonly understood as CI/CD pipelines.
Infrastructure Code Pipeline
Infrastructure = The systems that deliver, run, sustain and/or recover an application. In this case defined as code, be it imperative or declarative it must be fully codified.
Pipeline = In this case the pipeline is something that runs at a rate determined by the application it supports. This is also normally considered part of a CI/CD pipeline.
Secrets = Sensitive information used by the applications or infrastructure to operate. This is a logical and practical separation to ensure greater care in exposure and make dedicated monitoring of use easier.
Pipeline = Usually a secret manager of some sort. The delivery and segmentation should be controlled with strict role based access controls and as much security defenses as possible. Commonly missing or poorly implemented in continuous systems
Data = In the sense of raw data: sources, stores, and the ETL(extract/transform/load) modules that prepare the data for consumption by the application.
Pipeline = For data there is usually a combination of stream and batch processes depending on needs and resources but still this is a real pipeline that can be fully automated. Often adhoc and incomplete.
Content = To distinguish from data, content is human understandable, it is communication for the app with it's human consumers or operators/admins. Where data can be content, but it must be human recognizable, such as a PNG image of a cat instead of the binary representation.
Pipeline = Unlike most pipelines, this one explicitly requires human intervention, either in the creation, moderation or editing of the content so specific workflows need to be designed to facilitate this. This pipeline may run fully without human intervention, but it is rarely ever anything more than semi-automated. More common in user facing applications, but often missing in internally facing applications.
While this is usually the minimum set of pipelines for most applications there is also a sixth common pipeline I left out because of the wide ranging implications of implementing it. That is the configuration pipeline, it isn't always required by a system as often the configuration is a subset found in other pipelines such as infrastructure or secrets, and the configuration change tempo is commonly the same as the parent app pipeline. But external configuration managers are their own distinct pipelines and using them can make your applications more consistent and easier to manage. It's also a best practice to separate configuration from application code so a separate system is required, and usually this should be a pipeline capable of accepting rapid changes in it's own right.
There exists a wide ranging set of problems with configuration managers(software, not the people) that mostly spring from the business level and can have very negative impacts at the engineering level. That is that configuration can become a point of central governance, which sounds great from a security and business perspective, but can quickly become one of the core reasons teams begin to isolate, form silos and setup ticketing systems and workflows for making changes to centrally controlled resources.
The most common way of implementing configuration pipelines in a way that can potentially avoid the problematic control structures that grow around them is in code workflow solutions, such as GitOps. Where automated tests and code merging workflows provide non-repudiation, logging, versioning and appropriate oversight while placing them in a pipeline insures rapid delivery with automated testing assurances.
Configuration = Desired system state options within systems, subsystems and applications.
Pipeline = This is often implicit in a configuration manager and very similar to a secrets manager but often contains vastly more configuration data. In a pipeline we've implemented automated testing and default flow for these concerns.
The configuration pipeline is also problematic because of how much it can overlap with the infrastructure pipeline. Especially in today's abstracted cloud environments, the infrastructure code can easily be viewed as pure configuration. So a clear delineation of concerns must exist of where the hard infrastructure systems end and where the configuration parameters begin if you are going to implement a clear configuration pipeline.
That's a lot of pipelines...
The problem with most management systems that are not pipelines is the lack of built in automated testing and default end to end flow. Changing an isolated secret in a secret manager shouldn't break a system that is supposed to be pulling the secret as needed, but if an app is built around a long lived secret and doesn't check with the secret manager for a changed secret, and retry if authentication fails you will have a failure. So all pipelines should have automated testing environments, ephemeral preferably, that provide assurance of the quality and reliability of all changes that flow through them.
Most projects and systems that I've encountered only implement a subset of these application sub-systems as pipelines. And it's completely fine to use a management system that isn't clearly a pipeline for most of these six areas of concern. This can work and be completely manageable to not use pipelines. CI/CD, especially when you move beyond just the app code pipeline, is a business decision as much as an IT decision. There are very few real world use cases where every change in a system needs to travel immediately to production. Such speed to prod might be useful for incidents, but not day to day business.
It's important to point out that pipelines, assembly lines, ad-hoc development/deployment are all just ways of moving complexity from one form to another. Choosing the right feature support technique is a balancing act, but only feature deletion actually removes complexity. The right answer is the one that works for you and, as is often the case, works for the organization you work within.