Эта вакансия в архиве
Посмотреть похожие вакансии ↓4 дня назад
Site Reliability Engineer (Kubernetes)
Описание вакансии
Текст:
TL;DR
Site Reliability Engineer (Kubernetes): Improving the availability, performance, and scalability of large-scale, multi-cloud SaaS environments with an accent on automation, observability, and incident response. Focus on designing backend services and production engineering tools while integrating AI-assisted workflows to enhance operational efficiency.
Company
is a software company providing a platform to manage, accelerate, and secure software delivery from code to production.
What you will do
- Support the reliability, performance, and scalability of large-scale, multi-cloud, Kubernetes-based SaaS environments.
- Investigate and troubleshoot production issues across distributed systems and infrastructure in collaboration with Engineering teams.
- Design and develop backend services, internal platforms, and production engineering tools using Python or Go.
- Improve observability and operational readiness through SRE practices, monitoring, and capacity planning.
- Evaluate and contribute to AI-assisted automation solutions to improve troubleshooting and production workflows.
- Participate in on-call rotations and lead incident response to ensure system stability.
Requirements
- 2-4 years of experience in SRE, Production Engineering, or DevOps roles.
- Hands-on experience with Kubernetes-based containerized workloads.
- Experience with at least one public cloud provider: AWS, GCP, or Azure.
- Proficiency in developing backend services or automation tools using Python, Go, or similar languages.
- Strong understanding of Linux fundamentals, networking, and production troubleshooting.
- Familiarity with CI/CD tools and observability platforms like Prometheus or Grafana.
Nice to have
- Experience using AI-assisted operational workflows for log analysis or incident triage.
- Familiarity with agentic automation frameworks such as LangGraph or LangChain.
- Experience with AI-assisted development tools like GitHub Copilot or Cursor.
Culture & Benefits
- Opportunity to work on a mission-critical platform used by the majority of the Fortune 100.
- Collaborative, impact-focused environment with a focus on modern SRE practices.
- Continuous learning culture with exposure to cutting-edge technologies and AI integration.