Production Ready Checklists For Kubernetes v3
Production Ready Checklists For Kubernetes v3
Checklists for
Kubernetes
Production Ready
Checklists for
Kubernetes
Production readiness is a term you hear a lot, and depending on who you are talking to and what
they are doing, it can mean different things.
Readiness is dependant on your use case and can be about making tradeoffs. Although a cluster
can be production ready when it’s good enough to serve traffic, many agree that there are a
minimum set of requirements you need before you can safely declare your cluster ready for
commercial traffic.
The first checklist describes all of the things you need to include in your app before it gets
deployed to Kubernetes. The second checklist outlines the infrastructure you need to have in
place before running production traffic on a cluster.
Readiness Endpoints for Kubernetes Allows Kubernetes to restart Read the post: Resilient apps
to monitor your application or stop traffic to a pod. with Liveness and Readiness
check lifecycle. probes
Readiness failure is transient
and tells Kubernetes to route
traffic elsewhere.
Liveness Endpoints for Kubernetes Liveness failure is for telling Read: Resilient apps with
check to monitor your application Kubernetes to restart the pod. Liveness and Readiness
lifecycle. probes
Metric Code and libraries used in Allows measuring operation Prometheus, New Relic,
instrumentation your code to expose metrics. of application and enables Datadog and others.
many more advanced use
cases. Read: Monitoring Kubernetes
with Prometheus
Dashboards View of metrics. You need to understand the Grafana and many other
data. options.
Playbooks and Rich guides for your Nobody is at their sharpest at Confluence.
engineers on how-to operate 03:00 AM.
Runbooks the system and fault find Markdown files.
when things go wrong. Knowledge deteriorates over
time. Weave Cloud Notebooks.
Limits and Explicit resource allocation Allows Kubernetes to make Read: Kubernetes Pod
for pods. good scheduling decisions. Resource Limitations and
requests Quality of Service
Labels and Metadata held by Kubernetes. Makes workload management Read: Labels and Selectors
easier and allows other in Kubernetes
annotations tools to work with standard
Kubernetes definitions.
Alerts Automated notifications on You need to know when your Prometheus and
defined trigger. service degrades. Alertmanager.
Structured logging Output logs in a machine Trace what went wrong when ELK stack (Elasticsearch,
readable format to facilitate something does. Logstash and Kibana).
output searching and indexing.
Many commercial offerings.
Tracing Instrumentation to send Sometimes the only way of Zipkin, Lightstep, Appdash,
request processing details to figuring out where latency is Tracer, Jaeger
instrumentation a collection service. coming from.
Graceful shutdowns Applications respond to This is how Kubernetes will Read: 10 tips for Building and
SIGTERM correctly. tell your application to end. Managing Containers
Graceful Applications don’t assume Avoid headaches that Read: 10 tips for Building and
dependencies are available. come with a service order Managing Containers
dependency (w. Wait for other services before requirement.
Readiness check) reporting ready.
Configmaps Define a configuration Easy to reconfigure an app Read: Best Practices for
file for your application without rebuilding, allows Designing and Building
in Kubernetes using config to be versioned. Containers for Kubernetes
configmaps.
Labeled images Label the docker images with Makes tracing image to code
the code commit SHA. trivial.
using commit SHA
The CI portion of the CICD To deposit clean build CircleCI, Travis and Jenkins
Build pipeline pipeline. Tests, integrates and artefact to Container Registry. and others.
builds your container artefact.
Artefacts should be tagged
with the Git commit SHA to
verify provenance.
Deployment The deployment portion of More secure way of Weave Cloud, and Flux
pipeline the CICD. Takes the build doing deployment. Can
artefacts, and delivers them add approval process if
to the cluster. necessary.
Image registry Stores build artefacts. Keep versioned artefacts Roll your own.
available.
Need credentials for CI to Commercial:
push and for cluster to pull DockerHub, JFrog, or GCP
images. Registry.
Monitoring Collects and stores metrics. Understand your running OSS: Prometheus, Cortex,
infrastructure system. Get alerts when Thanos.
something goes wrong.
Commercial:
Datadog, Grafana Cloud,
Weave Cloud
Shared storage Store persistent state of your No one has a stateless app. Many. Depends on the
application beyond the pod’s platform.
lifetime.
Secrets How do your applications Secrets are required to Bitnami Sealed Secrets
management access secret credentials, access external services.
securely? Hashicorp Vault
Ingress controller Common routing point for all Easier to manage Platform controller (AWS ELB)
inbound traffic. authentication and logging
Google Compute Engine
(GCE) & NGINX (Kubernetes)
Single point for incoming Can route at HTTP level. Ambassador (Envoy).
API Gateway requests. Higher layer ingress Enables common and
controller that can replace an centralised tooling for tracing, Roll your own
ingress controller. logging, authentication.
Service mesh Additional layer on top of Enables complex use cases Linkerd
Kubernetes to manage like Progressive Delivery. Istio
routing. Adds interservice TLS, load
balancing, service discovery,
monitoring and tracing.
Service catalogue / Enables easy dependencies Simplifies deploying Kubernetes’ service catalog
Broker on services and service applications. API
discovery for your team.
Network policies Rules on allowed connections Prevent unauthorised access, Weave Net
and services. Needs a CNI improve security, segregate Calico
plugin. namespaces.
Authorization API level integration into the Uses existing SSO to reduce Requires custom work.
integration Kubernetes auth flow. the number of accounts
and to centralize account
management.
Image scanning Automated scanning of Because CVEs happen. Docker, Snyk, Twistlock,
vulnerability in your container Sonatype, Aqua Security
images. Implement in the CI
pipeline.
Log aggregation Bring all logs from application Logs are the best source of Many options.
into a searchable place. information on what went
wrong. Fluentd or ELK (Elasticsearch,
Logstash, Kibana) stack are
good bets for roll-your-own
Find out more about the Enterprise Kubernetes Platform or request a demo.