Role: Senior Site Reliability Engineer (SRE)
Location: Remote
Positions: 2
Experience: 10–12+ Years
Visa: Only Visa Independent
Mandatory Requirements (Screening Filter)
- Strong Go (Golang) programming – must be hands-on and very strong
- Strong Kubernetes (K8s) experience
- Candidate must be comfortable in coding + automation
- LinkedIn ID + DL Copy mandatory
- Candidate photo/screenshot required during submission
Job Summary
We are looking for a Senior SRE with strong expertise in AWS, Kubernetes, and Golang, focused on building reliable, scalable, and automated systems.
Key Responsibilities
Reliability & Performance
- Design monitoring & alerting systems using CloudWatch, Grafana, Prometheus, Datadog, ELK
- Maintain SLIs/SLOs, error budgets, and system performance
- Implement auto-scaling, health checks, and self-healing systems
- Perform RCA & post-incident reviews
Automation & DevOps
- Build infrastructure using Terraform, Ansible, CloudFormation
- Develop CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins)
- Manage workloads in Kubernetes, ECS, EKS, Lambda
- Implement blue/green, canary deployments & automated rollbacks
Incident Management
- Participate in 24/7 on-call rotation
- Reduce MTTD & MTTR using automation
- Create runbooks & operational playbooks
Security & Compliance
- Implement secure DevOps practices
- Ensure compliance with ISO 27001, SOC 2
- Manage IAM, secrets, networking securely
Collaboration
- Work with developers on scalable & reliable system design
- Drive DevOps & SRE best practices
- Contribute to platform improvements
Required Skills
- 6+ years in SRE / DevOps / Infrastructure
- Strong AWS: EC2, EKS/ECS, RDS, Lambda, S3, IAM, VPC
- Hands-on IaC tools: Terraform, Ansible, CloudFormation
- Observability tools: Prometheus, Grafana, CloudWatch, ELK, Datadog
- Programming: Go (must), Python/Bash/PowerShell
- Strong Networking (DNS, Load Balancing)
- Strong troubleshooting & RCA skills
Preferred Skills
- Certifications: AWS / CKA / SRE Foundation
- Experience in Chaos Engineering
- Knowledge of SLI/SLO/Error Budgets
- Experience with multi-region / hybrid architectures
- Exposure to regulated environments (SOC2, HIPAA, GDPR)