DevOps_q_and_a2
DevOps_q_and_a2
Monitoring infrastructure
Writing/maintaining IaC
Improving pipelines
Handling releases
Supporting dev teams
Troubleshooting incidents
Scenario style
“On a typical day, I monitor production clusters, manage and improve our Jenkins pipelines,
work on Terraform modules for resource provisioning, and collaborate with developers for
application releases. I also handle incidents like pod failures or networking issues.”
“In my last project, the private subnets for EC2 instances needed to pull OS updates from the
internet. I used a NAT gateway in the public subnet, routing traffic through it, while blocking
inbound access to those instances.”
Diagram
Advanced Q&A
Q: Can a NAT Gateway receive inbound traffic?
A: No — it only handles outbound requests from private subnets.
Best practices
Backup etcd
Drain nodes
Check deprecated APIs
Update kubectl
Test in staging
✅ Scenario style
“We planned a Kubernetes upgrade from v1.24 to v1.27. First, we backed up etcd, verified
compatibility of Ingress controllers, and checked deprecated APIs. Then we drained and
upgraded worker nodes in a rolling manner.”
🧠 Easy to remember: Backup, Drain, Check, Test
✅ Scenario style
1. “When upgrading a deployment, I set PDB to minAvailable: 2 to ensure at least 2 pods are
always running so user traffic isn’t impacted.”
2. “We needed to upgrade the node pool for a production app. We set a PDB to minAvailable: 2
to keep at least 2 pods online even during the drain.”
PDB
+--> minAvailable: 2
+--> maxUnavailable: 1
Advanced Q&A
Q: Does PDB protect from node failures?
A: No, PDB only controls voluntary disruptions (e.g., drain, upgrades).
Best practices
VPC CIDR
Public & private subnets
NAT gateway
Internet gateway
Route tables
Security groups
✅ Scenario style
“We designed a VPC with /16 CIDR, split into multiple private subnets for application servers,
with a NAT gateway for outbound internet, public subnets for the ALB, and separate security
groups for database layers.”
“Our VPC has a /16 CIDR block, split into public subnets for the ALB and private subnets for
EC2/EKS nodes. NAT gateways handle outbound traffic for private subnets. RDS is placed in
private subnet.”
Advanced Q&A
Q: Why put RDS in a private subnet?
A: For security — no direct internet access.
Best practices
separate public/private
follow least privilege on security groups
Q. How is your CI/CD pipeline set up? What security tools are
integrated?
Pointers
Jenkins/GitLab
Docker builds
SonarQube (code scan)
Trivy/Anchore (image scan)
HashiCorp Vault (secrets)
✅ Scenario style
“Our pipeline is on GitLab CI, running Docker builds, security scanning with Trivy, static code
analysis with SonarQube, and uses Vault to inject secrets. This ensures secure, consistent,
automated releases.”
Advanced Q&A
Q: How do you manage secrets?
A: Vault or SSM Parameter Store, never hard-coded.
Best practices
✅ Scenario style
“We manage pipelines through version-controlled YAML, infrastructure with Terraform, and
RBAC controls in Kubernetes to delegate least privilege.”
Best practices
✅ Scenario style
“In a microservice build, we use a Golang builder image, compile binaries, and then copy them
to a scratch image in a second stage. That keeps the production image minimal.”
Advanced Q&A
Q: Why use multi-stage?
A: Reduce attack surface and image size.
Best practices
✅ Scenario style
“We manage Kubernetes manifests for deployments and services in a GitOps workflow to
apply them consistently across environments.”
“We store Deployment and Service YAMLs in Git repos. We apply them with kubectl or
FluxCD.”
Advanced Q&A
Q: How to manage multiple environments?
A: Use Kustomize or Helm.
Best practices
version manifests
keep separate folders for dev/prod
✅ Scenario style
“We use Ansible Vault to encrypt DB passwords in our inventory, and decrypt only during
runtime with a vault password file.”
Advanced Q&A
Q: What if you lose the vault password?
A: You cannot decrypt — store vault password securely.
Best practices
✅ Scenario style
“We deployed 3 control plane nodes with an external HA load balancer and spread worker
nodes in 3 AZs to achieve high availability.”
Advanced Q&A
Q: How do you handle etcd failure?
A: Ensure odd number of etcd members and frequent snapshots.
Best practices
Prometheus, Grafana
Alertmanager
Common errors: CrashLoopBackOff, ImagePullBackOff
✅ Scenario style
“We use Prometheus + Grafana for metrics and Alertmanager for notifications. The most
common pod issue I handled was CrashLoopBackOff due to wrong configmaps or missing
secrets.”
Advanced Q&A
Q: What’s CrashLoopBackOff?
A: Container keeps crashing repeatedly, often due to bad configs.
Best practices
set up alerts
test alert receivers regularly
Best practices
“S3 scales virtually unlimited. One bucket can store billions of objects.”
Advanced Q&A
Q: Any hard limits?
A: Only practical ones (like request rates), no hard object limit.
Best practices
✅ Scenario style
“I assigned an IAM role to an EC2 instance with a policy to only allow S3 access for backup
storage.”
Advanced Q&A
Q: Difference between policy and role?
A: Role = identity; Policy = permission rules.
Best practices
Build outputs
Stored in artifact repositories like Artifactory or S3
✅ Scenario style
“Our CI pipeline pushes JAR artifacts to Artifactory after a successful Maven build.”
Advanced Q&A
Q: Why store them?
A: To enable rollback or re-deploys.
Best practices
SATS → DATS
✅ Scenario style
“After deploying, we perform SATS to validate application behavior, and DATS to verify data
correctness with staging data.”
Advanced Q&A
Q: Are they manual or automated?
A: Usually manual with automated test cases integrated.
Best practices
Logs
CI/CD dashboard
Alerts
Test reports
✅ Scenario style
“Whenever the pipeline fails, I check Jenkins logs, review the failing stage, and correlate with
Git commit changes.”
Advanced Q&A
Q: How do you get notified?
A: Slack or email from pipeline notifications.
Best practices
✅ Scenario style
“We created roles for installing Nginx, managing users, and deploying apps, to keep our
playbooks DRY and modular.”
Advanced Q&A
Q: Benefits of roles?
A: Reusable, maintainable, clean structure.
Best practices
structure roles with defaults, tasks, handlers
version control them
Thanks Everyone!
Connect with me: Amit Singh