Dzone TR Devops 2023
Dzone TR Devops 2023
Table of Contents
HIGHLIGHTS AND INTRODUCTION
34 [Comic] Dev vs. Ops:
03 Welcome Letter Conflicted? So Are We.
Daniel Stori, Software Development Manager at AWS
Jason Cockerham, Community Engagement
Manager at DZone
37 Source Code Management for GitOps
and CI/CD
04 About DZone Publications
Yitaek Hwang, Software Engineer at NYDIG
DZONE RESEARCH
42 CI/CD for Data Science and
05 Key Research Findings Machine Learning
AN ANALYSIS OF RESULTS FROM DZONE'S 2023
LEVERAGE THE BENEFITS OF AUTOMATION
DEVOPS AND CI/CD SURVEY John Nielsen, Developer at J.Nielsen
Ah, DevOps. It's one of those things that every developer Think about when you went back and watched that movie
can agree has revolutionized how we build, test, and deploy or read that book again several years later and how it
software, but no one can really agree on what it is. impacted you in a different way. Similarly, the DevOps world
of 2023 will differ and teach us other things than the DevOps
Sure, there are definitions out there, but I (and nearly every of the past.
developer colleague I ask about it) have yet to find one that
perfectly captures what it is or even one that most everyone And just like a good movie or book, there are several varying
can agree on. parts and aspects of it that must all work together to tell a
complete story. Every part of the DevOps story is important,
But that's OK. We don't really need a complete definition and they impact each of us differently.
of it because it's not really a thing that needs to be defined.
It's more of a concept, a methodology, a mindset — if you In our 2023 DevOps: CI/CD, Application Delivery, and Release
will. What's important about DevOps is the ideas we can Orchestration Trend Report, we'll explore how the nuances
take from it, how we use them to make our jobs easier, and of all the different aspects of DevOps — Infrastructure as
how that can change the way we think about and approach Code (IaC), shift left, source code management, CI/CD, and
what we do. more — are continuing to change how we build, test, and
deploy code today.
It's almost like a really great movie or book. We've all seen or
read one that had a profound impact on us, or at least made So grab the popcorn and an ice-cold soda, and enjoy the
us think about something important. DZone 2023 DevOps Trend Report!
Everyone else who has seen or read it probably walked away Thanks for joining us,
with something different from it even though the story, the
characters, the ending, etc., were all exactly the same.
DZone Publications
Caitlin works with her team to develop and Melissa leads the publication lifecycles of
execute a vision for DZone's content strategy as it pertains Trend Reports and Refcards — from overseeing workflows,
to DZone publications, content, and community. For research, and design to collaborating with authors
publications, Caitlin oversees the creation and publication on content creation and reviews. Focused on overall
of all DZone Trend Reports and Refcards. She helps with Publications operations and branding, she works cross-
topic selection and outline creation to ensure that the functionally to help foster an engaging learning experience
publications released are highly curated and appeal to for DZone readers. At home, Melissa passes the days reading,
our developer audience. Outside of DZone, Caitlin enjoys knitting, and adoring her cats, Bean and Whitney.
running, DIYing, living near the beach, and exploring new
restaurants near her home. Lauren Forbes
Content Strategy Manager at DZone
Lindsay Smith @laurenf on DZone
Senior Publications Manager at DZone @laurenforbes26 on LinkedIn
@DZone_LindsayS on DZone
Lauren identifies and implements areas
@lindsaynicolesmith on LinkedIn
of improvement when it comes to authorship, article
Lindsay oversees the Publication lifecycles quality, content coverage, and sponsored content. She
end to end, delivering impactful content to DZone's global also oversees our team of contract editors, which includes
developer audience. Assessing Publications strategies across recruiting, training, managing, and fostering an efficient and
Trend Report and Refcard topics, contributor content, and collaborative work environment. When not working, Lauren
sponsored materials — she works with both DZone authors enjoys playing with her cats, Stella and Louie, reading, and
and Sponsors. In her free time, Lindsay enjoys reading, playing video games.
biking, and walking her dog, Scout.
Jason Cockerham
Lucy Marcum Community Engagement Manager at DZone
Publications Coordinator at DZone @Jason Cockerham on DZone
@LucyMarcum on DZone @jason-cockerham on LinkedIn
@lucy-marcum on LinkedIn
Jason heads the DZone community, driving
As a Publications Coordinator, Lucy spends growth and engagement through new initiatives and
much of her time working with authors, from sourcing new building and nurturing relationships with existing members
contributors to setting them up to write for DZone. She and industry subject matter experts. He also works closely
also edits publications and creates different components with the content team to help identify new trends and hot
of Trend Reports. Outside of work, Lucy spends her time topics in software development. When not at work, he's
reading, writing, running, and trying to keep her cat, Olive, usually playing video games, spending time with his family,
out of trouble. or tinkering in his garage.
By G. Ryan Spain, Freelance Software Engineer, Former Engineer & Editor at DZone
In January 2023, DZone surveyed software developers, architects, and other IT professionals in order to understand the state of
DevOps and continuous integration and continuous delivery (CI/CD).
Methods: We created a survey and distributed it to a global audience of software professionals. Question formats included
multiple choice, free response, and ranking. Survey links were distributed via email to an opt-in subscriber list, popups on
DZone.com, the DZone Core Slack workspace, and various DZone social media channels. The survey was opened on January
10th and ended on January 25th; it recorded 201 complete and partial responses.
In this report, we review some of our key research findings. Many secondary findings of interest are not included here.
2. DevOps is often as much about organizational structure as it is about automation and toolchains. Who is in charge of
ensuring proper test coverage? Of approving pull requests? Of monitoring deployments? We tried to get a sense of who
handled deployments and how that impacted the development pipeline.
3. Understanding what is succeeding and what is failing in the delivery pipeline requires metrics tracking. To help
learn where organizations' priorities lay, we asked respondents what deployment metrics their organization tracked.
Furthermore, we asked respondents what metrics they believed to be most important for continuous delivery (CD).
DEPLOYMENT FREQUENCY
Finding the right release schedule for an application can be a tightrope walk; it's a difficult balance between releasing too
slowly and risking falling behind competitors in terms of new features or letting bugs remain unfixed, or releasing too quickly
and introducing half-finished features or adding more bugs to the application. This can become even more complex when
an organization is working with multiple applications, all with their own unique release requirements. We wanted to see how
often organizations were releasing their applications to production, as well as how appropriate respondents felt new feature
release schedules were, so we asked the following:
and
Overall, your organization releases new software features: {Too quickly, Too slowly, At the optimal rate, I have no opinion}
Figure 2
6.8%
16.7%
Too quickly
Too slowly
Observations:
1. One third of respondents' organizations release to production on a weekly basis or faster, one third release not quite
weekly but still multiple times per month, and one third release once or less than once per month. Of course, applications
and organizations will have individual needs which may dictate how often production releases should be done.
Respondents at mid-sized organizations, on average, indicated faster release schedules than others; 61% of those at
organizations with 20-49 employees and 50% of those at orgs with 50-99 employees release once a week or more; only
21% of respondents at organizations with 10,000+ employees reported releasing at this frequency.
The type of software also had some correlation with release frequency. Respondents who said they are developing
boxed software with over-the-web updates (57%) or native mobile apps (48%) were much more likely than the average
respondent to release once a week or more. 48% of respondents developing embedded software said that they
performed releases once a month or less.
2. Almost half (48%) of respondents thought their organization was releasing new software features at the optimal rate, 29%
thought new-feature releases were coming too slowly, and 17% thought new-feature releases were coming too quickly.
Comparing these responses to the production release frequency discussed earlier shows 52% of those at organizations
deploying less than once per month believe that new-feature releases are too infrequent, while 32% of those at
organizations deploying once per week or faster believe that these releases come too quickly.
Results (n=187):
Figure 3
1.1% 4.8%
18.2% Everyone
Observations:
1. Most responses were fairly evenly split between "Everyone" (39%) and "A [dedicated] release manager" (37%). 18% of
respondents said their team has "A release manager role that shifts."
2. Respondents' perception of their organization's focus on dev vs. ops seems to have little to no weight on who is
coordinating and monitoring deployments. Respondents who said that everyone helps with deployments, on average,
thought their org's focus was 66% on dev and 34% on ops. Respondents who said a dedicated release manager ran
deployments, on average, thought this focus was 63% on dev and 37% ops — a minor discrepancy within the margin
of error. So having dedicated managers in charge of releases does not appear to be indicative of an organization that
emphasizes a focus on operations.
3. Having a dedicated release manager did correlate with less frustration regarding environment drift. Respondents who
said that everyone coordinates and monitors deployments were much more likely to say that their organizations' different
environments drifted "Far enough to make every deployment nerve-wracking" (25%) than those saying they had a
release manager coordinating deployments. Likewise, those with a release manager answered that their organization's
environment drift was only "Far enough to cause occasional, mild annoyance" (35%) vs. respondents saying that everyone
coordinates deployments.
4. Having dedicated release managers also correlated with fewer incidents and rollbacks. When asked how often
deployments resulted in incidents or rollbacks, respondents who said that a release manager handled deployments
mainly answered "Rarely" (44%) or "Occasionally" (41%), rather than "Almost every deployment" (15%). This shows
a significant departure from respondents saying that everyone coordinates releases: Regarding the frequency of
deployment rollbacks, only 32% of these respondents answered "Rarely," while 35% answered "Occasionally" and 33%
answered "Almost every deployment."
DEPLOYMENT METRICS
Tracking delivery and deployment metrics is crucial to smoothing release hiccups (or preventing full-blown disasters). But too
many metrics means the most important indicators can get lost in a sea of data. To find out which metrics organizations are
tracking the most, as well as which metrics respondents found most vital to continuous delivery, we asked the following:
What do you believe are the most important metrics for continuous delivery? Please rank in order of most important (top) to
least important (bottom).
Figure 4
60
50
40
30
20
10
0
Velocity Error Response Customer Deployment Lead time Change Mean time
rate time satisfaction frequency for change failure rate to recovery
(MTTR)
Table 1
Observations:
1. "Velocity" (52%) and "Customer satisfaction" (50%) topped the list of deployment-related metrics tracked, ahead of
"Deployment frequency" (45%), "Response time" (45%), and "Error rate" (43%), implying that speed of development
and addressing customer concerns are moderately more important to organizations than how often code is put into
production, how quick application transactions are, or how often deployments cause issues. This is a shift from the results
of last year's survey, which reported "Deployment frequency" (55%) and "Error rate" (52%) as the most-tracked metrics,
while "Velocity" (42%) was only sixth in regard to the number of responses, being surpassed by "Customer satisfaction"
(50%), "Response time" (46%), and even "Lead time for change" (43%).
2. Respondents believed that "Deployment frequency" is the most important metric for continuous delivery, followed by
"Production downtime during deployment." These were also ranked numbers one and two, respectively, in last year's
survey. "Mean time to recovery" moved from fifth place last year to third this year, surpassing "Lead time" and "Change
failure rate," and "Absolute number of bugs" rose from eighth place last year to sixth this year — but otherwise, rankings
remained in the same order as they appeared in 2022.
Generally, this seems to show an overall agreement by software professionals about the most important metrics
needed for continuous delivery, even if those don't necessarily map exactly to the delivery metrics that organizations
tend to track the most.
2. Change can be quite difficult, so despite the advantages that CD can provide, there are still reasons an organization
might not adopt (or fully adopt) continuous delivery practices or tools. We wanted to see what respondents thought were
the main barriers to their organizations adopting continuous delivery.
3. Once CD has been, at least in part, adopted in an organization, we wanted to know if the results lived up to the hype. We
asked respondents how they felt CD adoption affected the quality of their applications, and looked at other data regarding
effects in the delivery pipeline, to determine how continuous delivery impacted software development and release.
What do you believe are the top reasons for adopting continuous delivery? Please rank in order of most important (top) to
least important (bottom).
Results (n=185):
Table 2
2. Job role appeared to play some part in the top factors for CD adoption. Developers/engineers and developer team leads
ranked "Reduced number of bugs post deployment" much higher than most, with each role ranking that reason in fourth
place. "Improved developer/team flow/productivity" was much more important to DevOps team leads (second place) and
developer team leads (third place) than others. And C-level executives rated reasons affecting costs — "Reduced overhead
costs" (second place), "Reduced maintenance costs" (fifth place), and "Reduced error budget" (eighth place) — as much
more important than other respondents, on average. While this kind of disparity could indicate that continuous delivery
"has something for everyone," it could also create tension between stakeholders over how CD is implemented.
BARRIERS TO ADOPTION
With so many reasons for adopting continuous delivery, why would any organization not buy in? Of course, nothing is ever
really that simple. Barriers to adoption can be numerous and varied, and we wanted to see what software professionals
thought were at the forefront, asking:
What do you believe are your organization's main barriers to adopting continuous delivery? Select all that apply.
Results (n=186):
Figure 5
Corporate culture
Lack of time
Insufficient budget
0 10 20 30 40
"Disparate automation technologies are not integrated well" (32%), "Lack of time" (30%), and "Our operations teams
don't have the right skill sets" (30%) were also popular responses, again suggesting multiple root causes rather than a
single overarching root cause creating the majority of main adoption barriers.
2. The survey results for this year mostly remained in line with results we saw from last year's CI/CD survey, with a few
notable exceptions. Respondents saying that a "Lack of continuous integration maturity" was a barrier decreased by 11%
(31% in 2022 vs. 20% in 2023). On the other hand, responses of "Our development teams don't have the right skill sets"
saw a 10% increase (29% in 2022 vs. 39% in 2023). There was also a 6% increase in responses that "Disparate automation
technologies are not integrated well" (26% in 2022 vs. 32% in 2023). All other responses were within 3.5% of those from last
year's survey.
ADOPTION RESULTS
After analyzing the benefits, overcoming barriers, implementing continuous delivery, and tracking the metrics, it's time
to review the results and see if CD adoption actually improves the quality of applications. To find out how respondents felt
continuous delivery affected software quality, we asked:
Overall, adopting continuous delivery has made your applications: {Higher quality, Lower quality, No change, Not applicable, I
don't know}
Results (n=189):
Figure 6
4.2%
6.3%
Higher quality
9.5%
2.6%
Lower quality
No change
77.2%
Not applicable
I don’t know
Observations:
1. The vast majority of respondents (77%) believed that continuous delivery has made their applications higher quality, while
10% believed there was no change, and only 3% thought that CD lowered the quality of their applications. These results
are not significantly different from those we saw in last year's survey, wherein 74% claimed that CD improved software
quality, 9% observed no change, and 4% said quality was lowered by CD.
2. Respondents in the smallest companies (1-4 employees; n=6) saw the least change in quality from CD adoption. Subsets
of respondents in companies with more employees (5+) all had between 78% and 87% saying that continuous delivery
adoption made their applications higher quality, whereas only 17% of respondents in the smallest companies said the
same. 67% saw no change in quality, and 17% said they did not know.*
This would make sense, as it is much less likely that organizations this small would have the capacity to implement
continuous delivery practices at the levels that larger companies could, and DevOps in general would likely be a lower
priority given a much smaller divide between "dev" and "ops."
This correlation may seem obvious, but it may be an indicator that dissatisfaction with continuous delivery results
regarding quality of applications may stem from improper or incomplete continuous delivery adoption. This is an area we
will examine further in the next section as we analyze responses regarding deployment automation and its effects.
*Note: Given the extremely small sample sizes of some of these result subsets, the confidence level for these statistics is low, and
results will need to be revisited in future research to verify or improve accuracy.
AUTOMATION REQUIREMENTS
Getting code from pre-production to production safely and effectively requires more than a few steps. And more time spent
on these steps means a greater divide between the time code is ready and the time a product is in the customer's hands.
Automating deployment steps is one of the primary ways to lessen that divide. We wanted to know what steps were necessary
to deployment automation, so we asked:
Check all things that must be true (i.e., join by Boolean AND) in order to automate production deployments:
Results:
Table 3
Production and pre-production environments are provisioned by the same code 55% 72% -17%
Realistic load tests are performed on every build 39% 45% -6%
There is an auditable approval process to promote changes from pre-production to production 36% 48% -12%
Rollback deployments have been used successfully in production many times before 34% 49% -15%
All whitelisted user paths are covered by automated UI tests 30% 42% -12%
Most (a fuzzy definition of "most") user paths are covered by automated UI tests 30% 37% -7%
Crucial features are called behind feature flags and can be turned off without a new deployment 26% 40% -14%
Other responses that were popular but not agreed on by a majority included "Realistic load tests are performed on
every build" (39%), "There is an auditable approval process to promote changes from pre-production to production"
(36%), and "Rollback deployments have been used successfully in production many times before" (34%). Automated UI
tests, feature flags, and other unit test coverage percentage options (> 50% and 100% unit test coverage) were the least
suggested of the list.
2. These results, for the most part, significantly differ from what we recorded from last year's survey. While unit test
coverage more than 75% remained fairly static (+1% from 57% in 2022), the top two deployment automation requirements
from last year — a robust code review process (75% in 2022) and production and pre-production environments
provisioned by the same code (72% in 2022) — fell by 18% and 17%, respectively. Prerequisites also dropping dramatically
from last year are successful past production rollback deployments (-15%), feature flags for crucial features (-14%),
auditable approval processes from pre-prod to prod (-12%), and automated UI tests for all whitelisted user paths (-12%).
This downward trend for almost every prerequisite listed seems to indicate a growing disagreement over what a fully
automated production deployment looks like.
Which of the following require manual intervention between dev and production? Select all that apply.*
and
Your organization has automated provisioning and deployment: {For all pre-production environments, For some pre-
production environments, For no pre-production environments}
*Note: This question was only displayed to respondents who answered "Yes" to the previous question.
Figure 7
1.1%
Yes
37.9%
No
80
60
40
20
0
Pulling Building Compiling Performing Performing Performing Performing Deploying Deploying
code artifacts source static unit load regression to to
code analysis tests tests tests staging production
Figure 9
9.3%
Observations:
1. 61% of respondents said their organization's production deployments require some manual steps, while 38% said their
prod deployments are fully automated. This shows a moderate increase in automated deployments from last year's survey,
where 67% said their production deployments required manual steps and only 29% said they were fully automated.
2. 75% of respondents with manual production deployment steps said that they manually performed the actual deployment
to the production server, the most selected manual step by far. 41% said they manually deployed to staging, 34%
said regression tests were manual, and 33% said load tests were manual. We do not have data from 2022 on manual
production deployment steps.
3. 50% of respondents said their organization has automated provisioning and deployment for all pre-production
environments, down slightly from last year's 55%. 41% said their organization only has automated provisioning and
deployment for some pre-prod environments — a minor increase from 37% in 2022. And 9% of respondents said their org
has no automated provisioning and deployment for pre-production environments, the same as last year.
EFFECTS OF AUTOMATION
Lastly, we wanted to see how deployment automation affected other aspects surrounding the development cycle. We
examined the automation statistics we discussed in the last section and analyzed their effects on things like technical debt,
incidents/rollbacks, and deployment frequency.
Table 5
Multiple times Once per Multiple times Once per Multiple times Once per Less than once
per day day per week week per month month per month
Manual steps
6% 2% 16% 5% 31% 21% 20%
required
No manual
8% 1% 18% 14% 38% 13% 7%
steps required
Table 6
*Note: We did not include the two "I don't know" responses to "Do your organization's deployments to production require any manual
steps?" in this analysis. Respondent subsets were divided into those who answered "Yes" and those who answered "No" to this question.
Observations:
1. There was no significant correlation between fully automated production deployments and respondents' perception on
technical debt. However, when we looked at how many manual steps respondents said their organizations required, we
found that those who said two or fewer steps (including zero steps for respondents with fully automated prod deploys)
were significantly more likely to report that their organization has the optimal amount of technical debt than those with
three or more manual steps (36% for two or fewer steps vs. 19% for three or more steps).
2. Respondents with fully automated production deployments had moderately quicker release frequencies than those
without. The differences between these sets were insignificant for the fastest frequencies (multiple times per day, once
per day, multiple times per week), but 14% of automated production deploys were once per week compared to only 5%
when manual steps were required. 38% of automated deploys were multiple times per month versus 31% for deploys that
weren't fully automated. Deployments requiring manual steps were more likely to release once per month (21%) or less
than once per month (20%) when compared to fully automated deployments (13% and 7%, respectively).
3. Interestingly, respondents who said that their organization's production deployments did not require any manual steps
were much more likely to report that "Almost every deployment" resulted in an incident or rollback (29%) compared to
those with manual prod deploy steps, with the latter group being more likely to say deployments "Occasionally" resulted
in incidents or rollbacks (44% manual deployments vs. 31% automated).
4. We believe that these discrepancies between the observations we found in the results above versus our expectations —
that fully automated production deployments would produce much more optimal technical debt, much faster delivery
times, and many fewer incidents and rollbacks — may be due, in part, to the lack of agreement on prerequisites to fully
automated deployments that we discussed in a previous section.
Future Research
Our analysis here only touched the surface of the available data, and we will look to refine and expand our DevOps and CI/CD
survey as we produce further Trend Reports. Some of the topics we didn't get to in this report, but were incorporated in our
survey, include:
Please contact [email protected] if you would like to discuss any of our findings or supplementary data.
G. Ryan Spain, Freelance Software Engineer, Former Engineer & Editor at DZone
@grspain on DZone | @grspain on GitHub and GitLab | gryanspain.com
G. Ryan Spain lives on a beautiful two-acre farm in McCalla, Alabama with his lovely wife and adorable
dog. He is a polyglot software engineer with an MFA in poetry; a die-hard Emacs fan and Linux user; a
lover of The Legend of Zelda; a journeyman data scientist; and a home cooking enthusiast. When he isn't
programming, he can often be found playing Minecraft with a glass of red wine or a cold beer.
A Beginner's Guide to
Infrastructure as Code
How IaC Works, Its Benefits, and Common Challenges
Infrastructure as Code (IaC) is the practice of provisioning and managing infrastructure using code and software development
techniques. The main idea behind IaC is to eliminate the need for manual infrastructure provisioning and configuration of
resources such as servers, load balancers, or databases with every deployment. As infrastructure is now an integral part of the
overall software development process and is becoming more tightly coupled with application delivery, it's important that we
make it easier to deliver infrastructure changes.
Using code to define and manage infrastructure and its configuration enables you to employ techniques like version control,
testing, and automated deployments. This makes it easier to prevent all sorts of application issues, from performance
bottlenecks to functionality failures.
This article will explain how IaC works, highlighting both approaches, as well as the benefits and challenges of delivering
infrastructure as code within a DevOps environment.
IaC evolved as a solution for scalable infrastructure management. It allows you to codify the infrastructure to then be able
to create standardized, reusable, and sharable configurations. IaC also allows you to define infrastructure configurations
in the form of a code file. For example, Figure 1 demonstrates how you’d define the creation of an S3 bucket in AWS, using
CloudFormation.
Resources:
S3Bucket:
Type: 'AWS::S3::BUCKET'
DeletionPolicy: Retain
Properties:
BucketName: DOC-EXAMPLE-BUCKET
As you define your Infrastructure as Code, you can implement the same practices that you use with application development
code, like versioning, code review, and automated tests.
• Configuration management tools that make sure the infrastructure is in the desired state that you previously defined,
like Ansible, Chef, or Puppet.
• Provisioning tools (e.g., CloudFormation templates) that allow you to define cloud resources in the form of a JSON or
YAML file, and provision that infrastructure on a cloud platform.
• Containerization tools (e.g., Docker, Kubernetes) used to package applications and their dependencies into containers
that can be run on any infrastructure.
Approaches to IaC
There are two different Infrastructure as Code approaches: imperative (procedural) IaC and declarative (functional) IaC.
With the imperative method, the developer specifies the exact steps that the IaC needs to follow to create the configuration.
The user commands the automation completely, which makes this method convenient for more specific use cases where full
control is needed.
The main advantage of the imperative method is that it allows you to automate almost every detail of the infrastructure
configuration. This also means that you need a higher level of expertise to implement this type of automation as it's mostly
done by executing scripts directly on the system.
Here is an example of an imperative way to create an S3 bucket using the AWS CLI:
When you run this command, the AWS CLI will create a new Amazon S3 bucket with the name my-new-bucket in the eu-
central-1 region.
With the declarative method, the developer specifies the desired outcome without providing the exact steps needed to
achieve that state. The user describes how they want the infrastructure to look through a declarative language like JSON or
YAML. This method helps with standardization, change management, and cloud delivery. Features can be released faster and
with a significantly decreased risk of human error.
{
“Resources”: {
“EC2Instance”: {
“Type”: “AWS::EC2::Instance”,
“Properties”: {
“InstanceType”: “t2.micro”,
“ImageId”: “ami-0c94855bac71e”,
“KeyName”: “my-key”
}
}
}
}
This script tells CloudFormation to create an EC2 instance of type t2.micro, and the CloudFormation will take care of all the
required steps to achieve that state with the specific properties you defined.
Codification Codifying infrastructure helps developers ensure that the Infrastructure as Code can reduce visibility into the
infrastructure is configured well without any unintended infrastructure if it is not implemented or managed
changes whenever it is provisioned. properly. Improved visibility can be achieved by making
sure that the code is well-documented, easily accessible,
standardized, simple, and properly tested.
Configuration IaC is idempotent, meaning that if a change happens Avoiding manual changes directly in the console reduces
drift that's not in sync with your defined IaC pipeline, the challenges with configuration drift in relation to IaC
correction to the desired state occurs automatically. implementation.
Version control Integrating with the version control system your team As the size and complexity of an organization's
uses helps create trackable and auditable infrastructure infrastructure grow, it can become more difficult to
changes and facilitates easy rollbacks. manage using IaC.
Testing and Infrastructure changes can be verified and tested as part Performing code reviews to ensure infrastructure
validation of the delivery pipeline through common CI/CD practices consistency is not always enough — there are usually a
like code reviews. variety of testing options specific to a use case.
Cost Automating time-consuming tasks like infrastructure Extra costs can easily ramp up if everyone is able to create
configuration with IaC helps minimize costs and cloud resources and spin up new environments. This
reallocate resources to more critical assignments. usually happens during the development and testing
phases, where developers create resources that can be
forgotten after some time. To prevent this, it's a good idea
to implement billing limits and alarms.
Speed There may be a higher initial investment of time and For organizations that run simple workloads, the process
effort to automate infrastructure delivery, but automating of automating and managing IaC can become more
IaC brings faster and simpler procedures in the long run. burdensome than beneficial.
Error handling Automating with IaC eliminates human-made In complex infrastructure setups, it can be extremely
infrastructure errors and reduces misconfiguration challenging to debug and troubleshoot the infrastructure,
errors by providing detailed reports and logs of how the especially when issues arise in production environments.
infrastructure works.
Security You can define and execute automated security tests As IaC is a much more dynamic provisioning practice that
as part of the delivery pipeline. Security experts can can be used to optimize infrastructure management, it
review the infrastructure changes to make sure they can be misused almost as easily. IaC can make it easier to
comply with the standards, and security policies can unintentionally introduce security vulnerabilities, such as
be codified and implemented as guardrails before hard-coded credentials or misconfigured permissions.
deploying to the cloud.
To ensure that IaC is implemented properly, it's important to start small. You want to gradually increase the complexity of the
tasks, avoid over-complicating the codebase, continuously monitor the IaC implementation to identify areas for improvement
and optimization, and continue to educate yourself about the different tools, frameworks, and best practices in IaC.
As a co-founder of Microtica, Marija helps developers deploy their applications on the cloud in minutes.
She's a software engineer with more than nine years of experience and now works as a product person
and technical writer full time. She writes about cloud, DevOps, GitOps, and Kubernetes topics.
Ten years ago, releasing software once per quarter meant teams had plenty of time to align during weekly meetings. DevOps
and technological advancements have transformed development and helped us launch faster, with higher quality, and more
often than ever.
However, these transformations have also unintentionally introduced more complexity and pressure on engineering teams.
Specifically:
The software development lifecycle has changed, challenging teams to reimagine the way they work together and
communicate. At Slack, we have learned how to successfully transform our own engineering team and many of our customers
to accelerate and simplify DevOps. Here are a few simple best practices to follow.
Slack has more than 2,500 out-of-the-box integrations with tools like Jira, GitHub, Jenkins, and AWS. They bring alerts into
channels and allow users to act without leaving Slack, drawing information out of multiple tools and siloed inboxes directly to
the right teams with added transparency. Using Do Not Disturb mode and customized notifications encourages asynchronous
work and enhances productivity.
Lightweight workflows and custom apps on the Slack platform enable engineers to custom-tailor automation to streamline
processes such as async standup meetings, requesting API or tool access, code reviews, declaring and managing incidents, and
submitting and viewing change requests. And if all else fails, ask the masses in your friendly neighborhood #help channel.
Shift-Left: A Developer's
Pipe(line) Dream?
The Traditional SDLC Is Broken and Long Overdue for
a "Shift" in Direction
The software development life cycle (SLDC) is broken. In fact, the implementation steps were flawed before the first project
ever utilized Winston Royce's implementation steps.
Dr. Winston Royce stated "the implementation described above is risky and invites failure" in the same lecture notes that
presented this very illustration back in 1970. Unfortunately, that same flaw has carried over into the iterative development
frameworks (like Agile) too.
What is broken centers around when and how the quality validation aspect is handled. In Figure 1, this work was assumed to
be handled in the testing phase of the cycle. The quality team basically sat idle (from a project perspective) until all the prior
work was completed. Then, the same team had a mountain of source code which had to be tested and validated — putting the
initiative in a position where it is difficult to succeed without any issues.
If it were possible to extract a top 10 list of lessons learned over the past 50+ years in software development, I am certain
that the consequence of placing quality validation late in the development lifecycle is at the top of the list. This unfortunate
circumstance has had a drastic impact on customer satisfaction — not to mention livelihood of products, services, or entire
business entities.
Employing a shift-left approach redistributes when the quality aspects are introduced into the lifecycle:
• Plan – Expose fundamental flaws sooner by leveraging and validating specifications like OpenAPI.
• Create – Establish unit and integration tests before the first PR is created.
• Verify – Include regression and performance/load tests before the first consumer access is granted.
• Package and so on – Assure the CI/CD pipelines perform automated test execution as part of the lifecycle. This includes
end-to-end and sanity tests designed to validate changes introduced in the latter phases of the flow.
• Defects in requirements, architecture, and design are caught near inception — saving unnecessary work.
• Difficulty in trying to comprehend and organize a wide scope of use cases to validate is avoided by the "test early and
often" approach inherent within shift left.
• Understand performance expectations and realities, which could potentially drive design changes.
• Determine breaking points in the solution — before the first production request is made.
• Avoid having to make design updates late in the lifecycle, which is often associated with unplanned costs.
• Higher staff levels required to fully test at the end of development are avoided by dedicating smaller staff levels to
participate throughout the development lifecycle.
TEST COVERAGE
In a "test early and often" approach, actual tests themselves are vital to the process. The important aspect is that the tests are
functional — meaning they are written in some type of program code and not executed manually by a human. Some examples
of functional tests that can adhere to shift left compliance are noted below:
• API tests created as a result of an API-first design approach or similar design pattern
• Unit tests introduced to validate isolated and focused segments of code
• Integration tests added to confirm interactions across different components or services
The more test coverage that exists, the better the solution will be. As a rule of thumb, API and unit tests should strive for 100%
code coverage and the remaining tests should strive for 90% coverage since reaching full coverage is often not worth the
additional costs required.
PIPELINE CONFIGURATION
Adoption of the shift-left approach can require updates to the pipeline configuration as well. In addition to making sure the
tests established above are part of the pipeline, sanity and end-to-end functional tests should be included. These sanity tests
are often short-running tests to validate that each entry point into the solution is functioning as expected. The end-to-end
functional tests are expected to handle the behavioral aspects of the application — validating that all of the core use cases of
the solution are being exercised and completed within the expected benchmarks.
RELEASE CONFIDENCE
The end-result of shift left in action is a high degree of confidence that the release will be delivered successfully — minimizing
the potential for unexpected bugs or missed requirements to be caught by the customer. This is a stark contrast to prior
philosophies that grouped all the testing at the end of the lifecycle — with a hope that everything was considered, tested, and
validated.
Too Idealistic?
Like any lifecycle or framework, there are some key challenges that must be addressed and accepted before shift left is
adopted into an organization to avoid a "too idealistic" conclusion.
LEVEL-SET EXPECTATIONS
The shift left approach will also require expectations to be established and clarified early on. This builds upon the last challenge
because each step of the lifecycle will likely require more time to complete. However, the overall time to complete the project
should remain the same as the projects that achieve full and successful testing at the end of the lifecycle.
It is vital to remember that defects found by shift left adoption will have less of an impact to resolve. This is because the testing
is completed within a given phase, reducing the potential to address additional concerns that must be resolved in prior phases
of the lifecycle.
Conclusion
Throughout my life, I have always been a fan of the blockbuster movies. You know those movies made with a large budget and
a cast that includes a few popular actors. I was thinking about the 1992 Disney movie Aladdin, which builds upon a Middle-
Eastern folktale and features the comic genius of the late Robin Williams. I remember there being a scene where the genie
gives Aladdin some information about the magical lamp. Immediately, the inspired main character races off before the genie
can provide everything Aladdin really needs to know. It turns out to be a costly mistake.
I feel like Dr. Winston Royce was the genie while the rest of us raced through the lifecycle without a desire to hear the rest of
the story, like Aladdin. Decades and significant cost/time expenditures later, a new mindset has finally emerged which builds
upon Royce's original thoughts.
To succeed, implementing shift left must be a top-down decision — driven by an organization-wide change in mindset and
supported by everyone. Not ever should a reduction in quality or performance testing be a consideration to shorten the time
required to release a new feature, service, or framework.
References:
John Vester, Lead Software Engineer at Marqeta, Freelance Writer, & DZone Core Member
@johnjvester on DZone | LinkedIn, Twitter, GitLab, GitHub, dockerhub
IT professional with 30+ years expertise in app design and architecture, feature development, and
project and team management. Currently focusing on enterprise architecture/app design utilizing
object-oriented programming languages and frameworks. Prior expertise in building (Spring Boot)
Java-based APIs against React and Angular client frameworks, CRM design, customization, and integration with
Salesforce. Additional experience using C# (.NET Framework) and J2EE (including Spring MVC, JBoss Seam, Struts Tiles,
JBoss Hibernate, Spring JDBC).
softare at scale.
deliver so.tware*
Get started:
Get started
buildkite.com/signup →
PARTNER CASE STUDY
Challenge COMPANY
The Intercom team deploys about 150 times per day across multiple Intercom
applications in the codebase. The continuous shipping of code requires the
team to run a large number of tests, and they needed a CI solution that could COMPANY SIZE
keep up with their demands. 120 engineers | 1,013 total employees
Previously, Intercom was using two other CI tools in order to achieve INDUSTRY
the stability they needed to build and enable Intercom developers to do Software
performance work on either of those environments. Even with dual solutions,
PRODUCTS USED
they were unable to achieve the speed and control levels needed. In the end,
Buildkite CI/CD
neither platform allowed the team to optimize. Plus, it was time-consuming
and expensive to keep up both tools.
PRIMARY OUTCOME
The team needed reliability, control, and speed. "Getting the reliability of tests 85 percent reduction in test time: The
to run correctly was the first thing we needed to focus on. Having full insight Intercom team can now run thousands
into what was going on would give us more visibility, and the ability fix things of tests in under three minutes.
"We just wanted a reliability improvement, so that was the small thing we
were going to try out first," reports Scanlan. Once they saw that reliability was CREATED IN PARTNERSHIP WITH
no longer an issue, the team began optimizing for speed and moving all of
their builds over to Buildkite.
Results
While it used to take 20 to 25 minutes, the Intercom team can now run tens
of thousands of tests in just three minutes. This provides their developers with
real-time feedback which translates to a better developer experience and also
gives them the ability to be more responsive when doing things like rolling
back problems, getting changes out, or dealing with security problems.
Continuous integration/continuous delivery (CI/CD) pipelines have become an indispensable part of releasing software, but
their purpose can often be misunderstood. In many cases, CI/CD pipelines are treated as the antidote to release woes, but in
actuality, they are only as effective as the underlying release process that they represent. In this article, we will take a look at a
few simple steps to create an effective CI/CD pipeline, including how to capture and streamline an existing release process, and
how to transform that process into a lean pipeline.
While the specifics of a release process will vary — some may require certain security checks while others may need approvals
from third parties — nearly all software release processes share a common purpose to:
Every team that delivers a product to a customer has some release process. This process can vary from "send the artifacts to
Jim in an email so he can test them" to very rigid and formalized processes where teams or managers must sign off on the
completion of each step in the process.
Capturing the steps of the current release process is the first step to creating a pipeline
• Step – A single action, such as Build, Unit Tests, or Staging, in the release process (i.e., the boxes).
• Stage – A single phase in the release process, containing one or more steps. Generally, stages can be thought of as the
sequential columns in a pipeline. For example, Build is contained in the first stage, Unit Test in the second stage, and
User Tests and Staging in the fifth stage. When there is only one step in a stage, the term step and stage are often used
synonymously.
• Pipeline – A set of ordered steps.
• Trigger – An event, such as a check-in or commit, that starts a single execution of the pipeline.
• Gate – A manual step that must be completed before all subsequent steps may start. For example, a team or manager
may need to sign off on the completion of testing before the product can be deployed.
A CI/CD pipeline is simply an automated implementation of a formalized release process. Therefore, if we wish to create an
effective CI/CD pipeline, it's essential that we optimize our release process first.
1. Streamline the process – We should minimize any bottlenecks or artificial steps that slow down our release process.
– Remove any unnecessary steps.
– Minimize the number of steps while fulfilling business needs.
– Simplify any complex steps.
– Remove or distribute steps that require a single point-of-contact.
– Accelerate long-running steps and run them in parallel with other steps.
2. Automate everything – The ideal release process has no manual steps. While this is not always possible, we should
automate every step possible.
– Consider tools and frameworks such as JUnit, Cucumber, Selenium, Docker, and Kubernetes.
– Capture the process for running each step in a script — i.e., running the build should be as easy as executing
build.sh. This ensures there are no magic commands and allows us to run each step on-demand when
troubleshooting or replicating the release process.
– Create portable scripts that can be run anywhere the release process is run. Do not use commands that will only work
on specific, special-purpose environments.
– Version control the scripts, preferably in the same repository as the source code.
3. Shorten the release cycle – We should release our product as often as possible. Even if the end deliverable is not shipped
to the customer or user (e.g., we build every day but only release the product to the customer once a week), we should
be frequently running our release process. If we currently execute the release process once a day, we should strive to
complete it on every commit.
Optimizing the release process ensures that we are building our CI/CD pipeline from a lean and efficient foundation. Any bloat
in the release process will be reflected in our pipeline. Optimizing our release process will be iterative and will take continuous
effort to ensure that we maintain a lean release process as more steps are added and existing steps become larger and more
comprehensive.
1. Don't follow fads – There are countless gimmicks and fads fighting for our attention, but it is our professional
responsibility to select our tools and technologies based on what is the most effective for our needs. Ubiquity and
popularity do not guarantee effectiveness. Currently, options for CI/CD pipeline tools include GitHub Actions, GitLab
CI/CD, and Jenkins. This is not a comprehensive list, but it does provide a stable starting point.
2. Maintain simplicity – Each step should ideally run one script with no hard-coded commands in the pipeline
configuration. The pipeline configuration should be thought of as glue and should contain as little logic as possible.
build:
stage: building
script:
- /bin/bash build.sh
unit-tests:
stage: unit-testing
script:
- /bin/bash run-unit-tests.sh
integration-tests:
stage: integration-testing
script:
- /bin/bash run-integration-tests.sh
...
deploy:
stage: deployment
script:
- /bin/bash deploy.sh --env production:443 --key ${SOME_KEY}
This ideal is not always possible, but this should be the goal we strive toward.
3. Gather feedback – Our pipelines should not only produce artifacts, but it should also produce reports. These reports
should include:
– Test reports that show the total, passed, and failed test case counts
– Reports that gauge the performance of our product under test
– Reports that show how long the pipeline took to execute — overall and per step
– Traceability reports that show which commits landed in a build and which tickets — such as Jira or GitHub tickets —
are associated with a build
This feedback allows us to optimize not only our product, but the pipeline that builds it as well.
By following these tips, we can build an effective pipeline that meets our business needs and provides our users and customers
with the greatest value and least amount of friction.
Conclusion
CI/CD pipelines are not a magic answer to all of our release problems. While they are important tools that can dramatically
improve the release of our software, they are only as effective as our underlying release processes. To create effective pipelines,
we need to streamline our release processes and be vigilant so that our pipelines stay as simple and as automated as possible.
Further reading:
• Continuous Delivery by Jez Humble & David Farley
• Continuous Delivery Patterns and Anti-Patterns by Nicolas Giron & Hicham Bouissoumer
• Continuous Delivery Guide by Martin Fowler
• Continuous Delivery Pipeline 101 by Juni Mukherjee
• ContinuousDelivery.com
Justin Albano is a Software Engineer at IBM responsible for building software-storage and backup/
recovery solutions for some of the largest worldwide companies, focusing on Spring-based REST API and
MongoDB development. When not working or writing, he can be found practicing Brazilian Jiu-Jitsu,
playing or watching hockey, drawing, or reading.
Staging
Jennifer Junb
General Partner
Jennifer Jung
Robert FulleI
Frontend Dev
Deployed htttps://staging.re...
Driving developeI
productivity and multi-cloud
deployments with EaaS
Release makes it easy and costleffective to create cloudlnative fulllstack environments
with data, onldemand within your CI/CD workfow. ighlperforming development teams
are using Release in their workfows to create fulllfeatured environments with every
code check in for each developer. Release allows these teams to preview and QA
features in the dev workfow, run production applications and deliver SaaS products
into their customer’s cloud environment.
See how Release enables app delivery teams to deploy more code, faster without
worrying about infrastructure.
Start now
release.com
PARTNER CASE STUDY
Challenge COMPANY
Chipper Cash has a global cohort of team members based across the world. Chipper Cash
For Chipper Cash's population of engineers, it was important to have quick
and easy access to tools and systems that help combat complexities around COMPANY SIZE
collaboration and learning. 350 employees
Chipper Cash wanted to set up local builds (or isolated environments) that INDUSTRY
would emulate the back end, so developers could test changes. Think about Financial Services
it this way — if you're developing a feature or trying to test a bug, it can be
PRODUCTS USED
difficult and time-intensive, as there are many moving pieces to the back
Ephemeral environments, app imports
end of an application.
Chipper Cash found that the engineers only had two options: ship a PRIMARY OUTCOME
massive image all at once (which can cause major outages), or make Adding Release to their CI/CD
changes quickly, which introduces risk. Additionally, Chipper Cash needed a pipeline allowed Chipper to reduce
responsive tool that was also natively integrated with their source code. application testing time from 24
hours to five minutes.
Before Release, Chipper Cash hosted their staging environment, or as
they refer to it, their sandbox environment, in Heroku. However, a single
staging environment meant dozens of developers were waiting in line to "Release has helped cut our testing
test changes. This became increasingly problematic as they added more time from days to minutes and
engineers to their team, resulting in more pull requests, and unmanageable provided insight to how we can
operational overhead. optimize internally."
the environment, test some features, and push code through continuous
integration pipelines.
Results
With Release, Chipper Cash has cut testing time from ~24 hours to about
five minutes. Now within their own dashboard, the team also has the ability
to view pull requests from other engineers and get a visual snapshot of the
code that is being pushed out daily, increasing teamwork and collaboration.
Passionate about computing since writing my first lines of code in Basic on Apple 2, I share my time
raising my young daughter and working on AWS Cloud Quest and AWS Industry Quest, a fun learning
experience based on 3D games. In my (little) spare time, I like to make comics related to programming,
operating systems, and funny situations in the routine of an IT professional.
IBUT CON
TO CONTROL & SECURE YOUR
SOFTWARE SUPPLY CHAIN
BUILD. SECURE. DISTRIBUTE. CONNECT.
LD SECURE
JFROG.COM/START-FREE
DISTRIBUT
PARTNER CASE STUDY
Challenge COMPANY
The team maintains a very mature DevOps pipeline, and has committed Bendigo and Adelaide Bank
to a four-year plan to transform the bank, "to get at least 80 percent of our
applications in the cloud" while maintaining compliance with rigorous COMPANY SIZE
banking regulations. 4,800 employees
To modernize, the DevOps Service team needed to migrate their JFrog INDUSTRY
Platform on-premises HA installation to self-managed K8s clusters in a Banking
cloud service provider, with enhanced developer productivity and continued
PRODUCTS USED
compliance while remaining compliant with regulations.
Kubernetes, Maven, Ruby Gems, Docker,
AWS EKS, Helm Charts, Google Cloud
Solution
The DevOps Services team chose Amazon Web Services (AWS) EKS to host
PRIMARY OUTCOME
the Artifactory repository and Xray security. Using the Helm charts from
By migrating from on-prem to AWS EKS,
JFrog made it "really easy to set up an instance of Artifactory and Xray in a
Bendigo and Adelaide Bank accelerated
Kubernetes environment with one command and a few values." Within a
build times, lowered operating costs, and
single hour, they can spin up a test or production cloud environment for the
positioned themselves for future multi-
JFrog Platform.
cloud operation.
JFrog Artifactory and Xray enable around 500 developers to safely use 15
unique package types (primarily Maven, Ruby Gems, and Docker) across
600+ cloud-native applications. Through the advanced DevOps best practices "It was really easy to set up an instance
enabled by Artifactory — including remote repositories, buildinfo metadata, of Artifactory and Xray in a Kubernetes
and promotion of immutable builds — as well as vulnerability scans by Xray, environment with one command and a
the bank deploys Kubernetes clusters into production daily. few values."
Federating repositories between the on-prem and AWS installations, — Caio Trevisan,
the bank was able to duplicate 1TB of accumulated packages, artifacts, DevOps Service Owner,
and binaries data to a new cloud environment in AWS. This bidirectional Bendigo and Adelaide Bank
mirroring capability in Artifactory, along with user token synchronization
through JFrog Access Federation, enabled developer teams to seamlessly
transition their repository use to the AWS environment with zero disruptions CREATED IN PARTNERSHIP WITH
to daily operation.
Migration of all data and teams to the new AWS environment — including
initial tests as well as rigorous internal compliance and governance
procedures — was completed within six months.
Results
Now set up for greater resilience after a successful migration to the cloud,
Bendigo and Adelaide Bank saw results, including:
GitOps has taken the DevOps world by storm since Weaveworks introduced the concept back in 2017. The idea is simple: use
Git as the single source of truth to declaratively store and manage every component for a successful application deployment.
This can include infrastructure-as-code (e.g., Terraform, etc.), policy documents (e.g., Open Policy Agent, Kyverno), configuration
files, and more. Changes to these components are captured by Git commits and trigger deployments via CI/CD tools to reflect
the desired state in Git.
GitOps builds on recent shifts towards immutable infrastructure via declarative configuration and automation. By centrally
managing declarative infrastructure components in Git, the system state is effectively tied to a Git commit, producing a
versioned, immutable snapshot. This makes deployments more reliable and rollbacks trivial. As an added benefit, Git provides a
comprehensive audit trail for changes and puts stronger guardrails to prevent drifts in the system.
Finally, it promotes a more consistent CI/CD experience, as all operational tasks are now fully captured via Git actions. Once the
pipeline is configured, developers can expect a standard Git workflow to promote their changes to different environments.
Even though the benefits of GitOps are well documented, best practices for implementing GitOps are still being formed.
Afterall, the implementation details will depend on the nature of the existing code repositories, size and makeup of the
engineering teams, as well as practical needs for imperative changes (e.g., emergency rollback or break glass procedures).
In this article, we'll look at how to choose the best strategy for embracing GitOps with the aforementioned considerations
in mind.
Fortunately, GitOps is not tied to a particular framework, but unless your organization already has robust tooling to deal
with monorepo builds, such as Bazel, the general recommendation is to at least separate application source code from
deployment artifacts.
1. Deployment cadence for application and infrastructure changes can be more easily separated and controlled.
For example, application teams may want every commit to trigger a deployment in lower environments, whereas
infrastructure teams may want to batch multiple configuration changes before triggering a deployment (or vice-versa).
2. There may be compliance or regulatory requirements to separate who has access to deploy certain aspects of the
application stack as a whole. Some organizations may only allow a Production Engineering or SRE team to trigger
production deployments. Having separate repos makes access and audit trails easier to configure.
3. For applications with dependent components that necessitate deployment as a single unit, a separate configuration repo
allows multiple application or external dependency repos to push changes independently. Then the CD tool can monitor
a single configuration repo and deploy all the components at the same time.
With that said, the exact point of separation for what belongs in the application repo versus the deployment artifacts repo
depends on the composition of your team. Small startups may expect application developers to be responsible for application
code and other deployment artifacts (e.g., Dockerfile, Helm charts, etc.). In that case, keeping those manifests in the same repo
and just keeping Terraform configs in another repo may make sense.
As for larger organizations with dedicated DevOps/infrastructure teams, these dedicated teams may own Kubernetes
components exclusively and maintain those separately from application code.
• For small teams with a small number of cloud accounts/projects, it will be easier to have a single repo to host all
deployment configs to trigger deployments to a small number of environments. As the infrastructure evolves, non-prod
artifacts can be separated from prod repos.
• For mid-sized teams with slightly more sophisticated tooling and complex cloud infrastructure (e.g., multiple projects
nested under organizations, hybrid-/multi-cloud), a repo per team may work well. This way different controls can be
implemented based on security or compliance needs.
• At the other end of the spectrum, a repo per service and environment provides the most flexibility in terms of controls for
large teams with robust tooling.
Another critical component to call out with GitOps is secret management. Since secrets can't be checked into Git as plaintext,
a separate framework is required to deal with secrets. Some frameworks like Bitnami Sealed Secrets check encrypted data
into Git and use a controller/operator on the deployed environment to decrypt the secrets. Others separate secrets entirely and
leverage secret stores such as Hashicorp Vault.
Finally, it's important to call out that even with GitOps, configuration drift may still occur. In fact, for small teams with immature
tooling, it may be critical to leave some room for imperative changes. For time critical outages, manual fixes may be necessary
to restore the service first before running through the official pipeline for a long-term fix. Before enforcing stringent policies,
test rollback and restore strategies to ensure that GitOps systems do not hinder emergency fixes.
Conclusion
"GitOps is the best thing since configuration as code. Git changed how we collaborate, but declarative configuration is the key
to dealing with infrastructure at scale, and sets the stage for the next generation of management tools."
– Kelsey Hightower
GitOps applies the best practices that application developers have learned over the years from interacting with Git to
declarative configuration and infrastructure. GitOps produces a versioned, immutable state of the system with an audit trail
The good news is that GitOps is flexible enough to adapt as your needs change. There is a plethora of GitOps tooling available
in the market today. Start with a simple tool to reap the benefits of GitOps as you grow your technology stack to boost
developer productivity.
References:
Yitaek Hwang is a software engineer at NYDIG working on Bitcoin technology. He often writes about
cloud, DevOps/SRE, and crypto topics.
Challenge COMPANY
Nederlandse Spoorwegen (NS), the Netherlands’ national railway system, Nederlandse Spoorwegen
plays an essential role in millions of daily lives by providing train, bike, and car
transportation options via their app or website. COMPANY SIZE
30,000+ employees
NS wanted to better serve customers by providing a more streamlined,
inclusive, and personalized experience. They were planning a move from INDUSTRY
waterfall methodology to agile methodology, which required its testers to Travel
significantly scale up test volume. To accomplish this, NS needed to increase
PRODUCTS USED
test coverage overall. Unfortunately, testing was already a painfully slow and
Sauce Cross-Browser, Sauce Mobile
manual process, with testers relying on their own infrastructure and phones.
These practices left testing incomplete and inconsistent, leading to negative
PRIMARY OUTCOME
customer experiences.
NS built and released a new app in just
Another problem that NS struggled with was the onboarding of staff. Without three months thanks to Sauce Labs'
a centralized system, onboarding and knowledge sharing was difficult. commitment to automation, an easy
onboarding process, and having the
Solution most coverage of devices and OS.
In a thorough evaluation of testing solutions, the NS testers were impressed
with Sauce Labs, citing the solution's intuitiveness and dashboard as key
differentiators. "The ease of working stood out," van Veenendaal says. "For "Perhaps most important of all,
example, working with the cross-browser tests really makes a difference." Nederlandse Spoorwegen can provide
Ultimately, the company chose Sauce Labs because of its wide, reliable customers with a better experience.
testing coverage, local data center, and outstanding customer support. We're able to offer our customers a more
personalized experience."
Results
— Serge van Veenendaal,
NS is extremely satisfied with Sauce Labs. It now uses a local data center,
Non-Functional Test Specialist,
which helps the company avoid security issues while improving performance
Nederlandse Spoorwegen
time by 50 percent. "Now, testing is faster and more thorough," van
Veenendaal says. "If we have any issues, we find them quickly. It's simply more
efficient to work with one dashboard and one tool." The tribal knowledge
CREATED IN PARTNERSHIP WITH
problems and painful experiences of onboarding are also a thing of the past.
"The developers use Sauce Labs from their local environments and are very
happy with all the possibilities. They can test and check all the things they
want to know, seeing in real time if a new feature is broken," van Veenendaal
says.
Continuous integration (CI) and continuous delivery (CD) are two core principles of DevOps that can be applied to machine
learning (ML) and data science projects, enabling teams to quickly iterate on their data and models, and they can deploy them
in a more reliable, secure manner. CI/CD pipelines provide end-to-end automation, allowing data scientists and ML engineers
to focus on the tasks for which they are best suited — such as data engineering, cloud architecture, analytics, model training,
and more. With CI/CD in place, better quality control, faster iterations, and increased collaboration with other team members
can be ensured.
• Increased efficiency and speed of model development – automatically test and deploy new models, reducing the time
it takes to bring a new model into production
• Improved collaboration and coordination – work on different parts of a project simultaneously, automatically merging
changes without conflicts
• Enhanced reproducibility and traceability – track the entire model development process, from code to data to results,
making it easy to reproduce and understand the decisions that led to a particular model.
• Better model governance – improve management and governance of organizations' ML models to reduce risk and
increase trust through the creation of a clear and auditable process for model development and deployment
Despite the numerous benefits, setting up CI/CD for data science and ML applications can be a daunting task. It is not as easy
as setting up a CI/CD pipeline for Go, Node.js, or any other programming language. There are certain things that need to be
addressed before moving forward, such as the following challenges:
• Data dependencies – The management of complex data dependencies in data science and ML applications can make it
challenging to automate the process of building, testing, and deploying models.
• Data versioning – This is a crucial part, as it allows for tracking changes to data and reproducing experiments.
• Environment setup – Data science and ML applications often have complex environment dependencies, such as
specific versions of libraries and frameworks, that make it challenging to set up and manage the environments needed
to run the models.
• Reproducibility – Ensuring reproducibility of the models and experiments is a key requirement in data science and ML,
but it's hard to achieve in a CI/CD pipeline.
• Security – Sensitive data is often involved, and organizations need to ensure that the data is protected throughout the
CI/CD pipeline.
• Scalability – Data science and ML applications can require large amounts of data and computational resources, which
can make it challenging to scale the CI/CD pipeline.
Let’s dig a little deeper and see how we can properly implement CI/CD for data science and ML applications. There are several
steps you can take to approach the implementation:
• Set up a version control system, such as Git, to track changes to your code and data.
• Use a CI tool to automatically build, test, and validate your code and data.
• Create automated tests for your code and data to ensure that changes do not break existing functionality.
• Use a CD tool to automatically deploy your code and data to a staging or production environment.
• Set up monitoring and logging to track the performance of your deployed models and detect any issues.
• Implement a feedback loop to track and use the results of your deployed models to improve them over time.
• Use containerization or virtualization technologies like Docker to package your code and dependencies for easy deployment.
• Use cloud-based services like AWS, GCP, or Azure to deploy your models and infrastructure.
It's important to remember that CI/CD is an iterative process, so you may need to experiment to find the best approach for your
specific use case.
The pipeline starts with a developer committing code changes to a version control system (VCS) like Git. Next, a CI tool
automatically builds and tests the code, ensuring it is stable and ready for deployment. If the tests pass, the pipeline continues
to the CD stage, where the code is automatically deployed to a Kubernetes cluster. The deployed code includes the MLflow
model and a Kubernetes deployment file, which describes how to run the model in a Kubernetes pod.
Once the model is deployed, it can be accessed via an API endpoint or a web UI, which allows users to send requests to the
model and receive predictions in real time. Additionally, the pipeline can also include monitoring and logging tools to track the
performance of the model and gather feedback from users.
Overall, this pipeline allows for efficient and automated deployment of ML models, reducing the effort required by developers
and allowing them to focus on improving the model.
• Code integration – The code for the data science model is integrated with the pipeline. This can be done using a version
control system like Git, which allows multiple developers to work on the code simultaneously and keeps a record of all
changes made to the code.
• Building – The code is built by compiling and packaging it so it can run on different environments. This can be done
using a tool like Docker, which allows you to build a containerized environment that can be easily deployed on different
machines.
• Testing – The model is tested to ensure it is working as expected. This can be done using unit tests, integration tests, and
end-to-end tests. Automated testing frameworks like pytest can be used for this. Even CI/CD tools can help you build and
test your applications.
• Deployment – The model is deployed to a production environment, where it can be used by end-users. This can be done
using tools like Kubernetes or AWS Elastic Beanstalk.
• Monitor – After deploying, it is important to monitor the model's performance to understand how well it is doing in real-
world scenarios.
Overall, a CI/CD pipeline for data science applications helps to ensure that the model is reliable, maintainable, and can be easily
updated with new features and improvements.
Conclusion
As the whole world is going crazy over data science and ML, there is a huge demand for the CI/CD approach. Applying CI/CD
and DevOps principles to data science and ML projects can help streamline and automate the entire workflow — from data
collection, training, and testing to model deployment. By automating the process, organizations can achieve faster results,
improved scalability, and higher quality models. Gain a competitive advantage over others by adopting CI/CD and reap the
enormous benefits that this approach provides. Happy DevOpsing!
US-based Greenway Health turned to Chef DevOps to improve its application COMPANY
deployment and management process across three data centers to Greenway Health
speed delivery of market-leading electronic health records (EHR), practice
management, and revenue cycle management solutions that 55,000+ COMPANY SIZE
healthcare providers use to grow profitability, remain compliant, work more 1,200 employees
efficiently, and improve patient outcomes.
INDUSTRY
Challenge Healthcare
With two flagship products from which multiple downstream specialty
PRODUCTS USED
applications flow, Greenway Health engineers needed to maintain and
Chef Enterprise Automation Stack
update about 3,000 application endpoints. On average, the four-person team
including Infra, Inspec, Habitat, and
spent about two weeks to hit every endpoint — a significant use of time and
Automate
resources for the task.
PRIMARY OUTCOME
Solution Greenway Health implemented Progress
Given the bottlenecks in getting applications deployed quickly, Epps and
Chef in three data centers to streamline
his team set out to find the most agile DevOps automation solution to
the application release process, shorten
build continuous delivery pipelines across all applications at scale. Chef
deployment time, and improve overall
provides automation capabilities for defining, packaging, and delivering
application management.
applications to almost any environment, regardless of the operating system
or deployment platform. The Greenway Health team:
• Built continuous delivery pipelines across all applications and all change "The Chef Habitat features provide the
events with Chef Habitat most agility and flexibility to target our
• Standardized deployment practice for faster application delivery with Chef endpoints efficiently and manage and
maintain the application stack. Chef is
• Used Chef's Automation at Scale approach to easily meet Greenway
having a strong impact on everything."
Health's business growth
— Adam Epps,
From a deployment standpoint, it took less than four months to implement
Manager, Systems Engineering
across the entire infrastructure.
Operations, Greenway Health
Results
Adam Epps summed up the most significant benefit of Chef in one word —
flexibility: CREATED IN PARTNERSHIP WITH
Monitoring and managing a DevOps environment is complex. The volume of data generated by new distributed architectures
(such as Kubernetes) makes it difficult for DevOps teams to effectively respond to customer requests. The future of DevOps
must therefore be based on intelligent management systems. Since humans are not equipped to handle the massive volumes
of data and computing in daily operations, artificial intelligence (AI) will become the critical tool for computing, analyzing, and
transforming how teams develop, deliver, deploy, and manage applications.
With the application of MLOps principles, data scientists can focus on the core development of machine learning models while
the MLOps practices take care of tasks such as data cleaning, quality control, and model versioning.
Applying MLOps equally benefits business owners and clients. Automation increases the velocity of development, leading to
faster results and more reliable machine learning models. This leads to shorter development times that, in turn, bring faster
end-result delivery and cost effectiveness. Finally, automated quality control ensures more reliable solutions that are ensured
and tested to function as intended, reducing the risk of faulty deployments.
• A version control to keep track of any changes in the datasets or the models
• A feature store to centralized data and frequently used features
• A tracker to monitor the performance of models in training
• A tool to train models using a set of optimal hyperparameters automatically
• A platform to deploy models in production
• A monitoring tool to track and govern machine learning models deployed in production
AIOps solutions can also automate manual tasks such as event correlation, root cause analysis, and incident resolution, freeing
IT teams to focus on more strategic initiatives. AIOps can also help organizations achieve faster problem resolution, reduced
downtime, and improved overall IT operations efficiency. It helps teams to work faster and smarter by unleashing the power of AI.
The core capabilities of AIOps that enable efficient digitization of workflows are:
1. Process optimization – Enhances efficiency throughout the enterprise by giving a comprehensive understanding of the
connections and effects between systems. After identifying a problem, it facilitates refinement and ongoing monitoring
of processes.
2. Performance analytics – Anticipates performance bottlenecks by examining trends and making necessary
improvements as needed.
3. Predictive intelligence – Utilizes machine learning to categorize incidents, suggest solutions, and proactively
alert critical issues.
4. AI search – Offers precise, personalized answers through semantic search capabilities.
5. Configuration management database – Enhances decision-making with visibility into the IT environment by
connecting products throughout the digital lifecycle, allowing teams to comprehend impact and risk.
1. ML-based pattern discovery – AIOps or IT analytics involves identifying patterns. Machine learning leverages the
computational capability of computers to identify these patterns in IT data.
2. Anomaly detection – Unusual system behavior, such as downtime or poor customer experience, can occur from changes
in normal behavior. AIOps enables the detection of any deviations from typical activities.
3. Predictive insights – AIOps introduces predictability in IT operations, enabling IT staff to proactively address issues before
they occur, ultimately reducing the number of service desk tickets.
4. Automated root cause analysis – Simply having insights isn't enough. It's important to take action. In traditional IT
management, staff monitor systems and take action as needed. However, with the growing volume of IT infrastructure
issues, it can be difficult for staff to manage and resolve issues in a timely manner, especially when multiple systems are
involved and root cause analysis can be time consuming. AIOps automates this process in the background.
AIOPS TOOLSET
AIOps tools gather data from multiple sources to provide a comprehensive view of IT operations. They collect data such as
application logs and measure system performance, breaking down silos of IT information and bridging the gap between
software, hardware, and cloud issues. AIOps solutions aid IT operations by providing tools for root cause analysis, event
correlation, and cloud mapping to support automation:
1. Intelligent observability – AIOps employs advanced monitoring techniques with the use of contextual information,
AI, and automation to gain a complete understanding of IT issues. Precise root cause analysis with actionable
insights is provided.
2. Continuous automation – Reduces manual effort in deployment, configuration, and management and automatically
identifies and assesses the severity of issues in terms of user and business impact. Achieving continuous discovery,
effortless deployments, and automatic dependency mapping is made possible.
3. AI-assistance – Performs efficient and error-free root cause analysis. Precise and reproducible results are achieved with
the AI engine integrated into every aspect.
MLOps and AIOps both aim to serve the same end goal: business automation. While MLOps bridges the gap between
model building and deployment, AIOps focus on supporting and reacting to issues in real time and providing analytics to
the operations team. AIOps combines big data and machine learning to automate performance monitoring, event analysis,
correlation, and IT automation.
There are parallels in the teams and abilities needed to properly execute AIOps and MLOps, despite the obvious distinctions. It
is worthwhile to consider where they intersect to determine which resources can support both disciplines.
Conclusion
Organizations throughout the world are increasingly looking to automation technologies as a means of improving operational
efficiency. This indicates that tech leaders are becoming more and more interested in MLOps and AIOps.
Machine learning systems can simplify data collection from various parts of the DevOps system like velocity, defects found,
and burn rate. MLOps takes care of the continuous integration and deployment of the models. It allows users to shed light
on important patterns and exploit data to extract meaningful information. It also implies surveillance and continuous model
training in production in order to ensure the reliability and stability of those models.
AIOps can play a crucial role in accelerating DevOps efficiency. It is defined as the usage of big data and machine learning to
automate operations such as event correlation, determining cause and effect, and identifying unusual events.
In other words, MLOps and AIOps can work together. Artificial intelligence will help boost performance by enabling instant
development and operations cycles, and by delivering a compelling customer experience on these features. Machine learning
will enable companies to gather metrics such as the number of integrations, the time between them, their success rate, and
defects per integration, which are only valuable when they are accurately evaluated and correlated.
Hicham is an engineer with about 8 years of experience working as a quality assurance engineer, DevOps
engineer, and cloud architect. Always on the lookout for new challenges and problems to solve but also
keen on sharing his knowledge, he decided to co-found KumoMind with Nicolas Giron to write tech-
related content for the community.
Nicolas is an IT engineer with over 10 years of experience as a developer, cloud architect, DevOps, and
SRE. His background and curiosity have allowed him to develop his technical skills in different fields
beyond his areas of expertise. As co-founder of KumoMind with Hicham Bouissoumer, they aim to share
their deep expertise in open-source technologies, cloud computing, and emerging technologies.
Continuous Integration Patterns and Anti-Patterns DevOps: CI/CD and Application Release Orchestration
Reap the full benefits of enhanced code quality, better In DZone's 2021 DevOps Trend Report, we provide insight into
testing practices, and early error detection with proper how CI/CD has revolutionized automated testing, offer advice
implementation of CI processes. This Refcard explains detailed on why an SRE is important to CI/CD, explore the differences
patterns and anti-patterns for core areas of CI, including between managed and self-hosted CI/CD, and more. The goal
version control, the build stage, pipeline monitoring, is to offer guidance on how to best adopt DevOps practices to
documentation, as well as communication and collaboration help scale the productivity of teams.
across teams and within the organization.
CI/CD: Automation for Reliable Software Delivery
Getting Started With CI/CD Pipeline Security In 2020, DevOps became more crucial than ever as companies
The increasingly distributed nature of CI/CD frameworks moved to distributed work and accelerated their push toward
has made organizations more vulnerable to attacks. In cloud-native and hybrid infrastructures. Our Trend Report
this Refcard, you'll learn about the primary focus areas of examines what this acceleration looked like for development
CI/CD pipeline security, review common pipeline threats and teams across the globe and dives deeper into the latest DevOps
security challenges, as well as walk through seven steps to get practices that are advancing CI, CD, and release automation.
started with securing your pipelines.
MULTIMEDIA
Solutions Directory
This directory contains tools for automation, IaC, pipelines, testing, and more to help manage application
development and deployment, CI/CD pipelines, and DevOps practices. It provides pricing data and
product category information gathered from vendor websites and project pages. Solutions are selected
for inclusion based on several impartial criteria, including solution maturity, technical innovativeness,
relevance, and data availability.
Progress Chef Chef Configuration management DevOps tool Open source chef.io/products
Sauce Cross-Browser
and OS combination browser-testing
accelq.com/products/test-
Automate Mobile No-code mobile test automation
automation-mobile
AccelQ Trial period
Test automation tool for web, API, accelq.com/products/test-
Automate Web
mobile, desktop, and more automation-web
Apache Software
Apache Ant Java library and command-line tool Open source ant.apache.org
Foundation
autorabit.com/products/
ARM CI/CD for Salesforce
automated-release-management
autorabit.com/products/vault-data-
Vault Data protection for Salesforce
backup-recovery
basistechnologies.com/products/
ActiveControl DevOps automation for SAP
Basis activecontrol
By request
Technologies basistechnologies.com/products/
Testimony Robotic test automation for SAP
testimony
broadcom.com/products/software/
Automic® Continuous Real-time app deployment
continuous-delivery/automic-
Broadcom Delivery Automation monitoring and management By request
continuous-delivery-automation
broadcom.com/products/software/
Continuous Delivery
Release planning and management continuous-delivery/automic-
Director
continuous-delivery-director
Cerberus Testing Cerberus Testing Scalable test automation platform Free tier cerberus-testing.com
D2iQ Kubernetes
Kubernetes platform d2iq.com/kubernetes-platform
Platform
D2iQ Trial period
DKP Enterprise Kubernetes platform d2iq.com/products/enterprise
flexagon.com/flexdeploy/devops-
Flexagon FlexDeploy DevOps value stream delivery platform By request
vsdp-platform
cloud.google.com/deployment-
Google Cloud Deployment Manager Infrastructure deployment Trial period
manager/docs
Graham
Continuous integration service that
Campbell StyleCI By request styleci.io
enforces code style preferences
Technology
Cloud Continuous
UI- and CLI-based DevOps workflows Trial period ibm.com/cloud/continuous-delivery
Delivery
ibm.com/products/cloud-pak-for-
Cloud Pak DevOps management with AI analysis By request
watson-aiops
IBM Engineering Lifecycle Software for product and application ibm.com/products/engineering-
Management lifecycle management lifecycle-management
Trial period
Instana Application performance monitoring ibm.com/products/instana
influxdata.com/solutions/
InfluxDB Kubernetes monitoring solution Trial period
kubernetes-monitoring-solution
InfluxData
Telegraf Data collection agent Free tier influxdata.com/telegraf
Knapsack Pro Knapsack Pro CI infrastructure test management Free tier knapsackpro.com
LaunchDarkly LaunchDarkly Feature flag and toggle management Trial period launchdarkly.com
microfocus.com/en-us/products/
CODAR Application release automation codar-continuous-deployment/
overview
microfocus.com/en-us/products/
Hybrid Cloud Continuous delivery and release
hybrid-cloud-management-native/
Micro Focus Management automation By request
overview
microfocus.com/en-us/products/
Operations Bridge AIOps
operations-bridge
Outsystems Outsystems Low code for app development Free tier outsystems.com/low-code-platform
prodly.co/products/salesforce-
Prodly Prodly DevOps Salesforce DevOps automation By request
devops
Ansible Automation
Red Hat IT automation By request ansible.com
Platform
VSoft
Continua CI Scalable continuous integration server Free tier finalbuilder.com/continua-ci
Technologies
Weaveworks Weave GitOps Core Continuous delivery for Kubernetes Open source weave.works/product/gitops-core