Lead Software Engineer - Site Reliability (SRE)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Lead Software Engineer - Site Reliability (SRE): Designing for resilience, automating recovery, and ensuring system stability and observability at scale with an accent on SLIs/SLOs and performance engineering. Focus on building automated monitoring pipelines, leading incident response, and implementing high-availability distributed systems.
Company
builds uncomplicated service software that delivers exceptional employee and customer experiences through enterprise-grade CX and IT solutions.
What you will do
- Design and implement tools to improve system availability, latency, scalability, and overall health.
- Define SLIs/SLOs, manage error budgets, and drive performance engineering efforts.
- Build and maintain automated monitoring, alerting, and remediation pipelines.
- Lead incident response, perform root cause analysis, and drive blameless postmortems.
- Champion observability across services using logs, metrics, and traces.
- Contribute to infrastructure architecture, automation, and reliability roadmaps.
Requirements
- 7–12 years of experience in SRE, DevOps, or Production Engineering roles.
- Strong coding proficiency and in-depth Linux expertise for advanced troubleshooting.
- Practical experience with Docker and Kubernetes for application deployment and orchestration.
- Experience designing and maintaining Continuous Integration and Continuous Delivery (CI/CD) pipelines.
- Proficiency in Infrastructure as Code (IaC) tools and infrastructure automation.
- Deep knowledge of Disaster Recovery (DR) and High Availability (HA) strategies for distributed systems.
Nice to have
- Degree in Computer Science, Engineering, or a related field.
- Experience scaling services in production with high uptime targets (99.99%+).
- Proven track record of reducing incident frequency and improving MTTD/MTTR metrics.
Culture & Benefits
- Inclusive environment welcoming colleagues of all backgrounds, genders, and orientations.
- Commitment to equal opportunity and workplace diversity.
- People-first approach to AI and a culture of reducing complexity.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →