Site Reliability Engineering (SRE) Ops Team Lead
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Site Reliability Engineering (SRE) Ops Team Lead: Lead operations, reliability, and stability of production systems in hybrid cloud and on-prem environments with an accent on incident response, observability, alerting, and automation. Focus on driving uptime, SLAs, cost optimization, capacity planning, and team leadership for high-availability SaaS platforms.
Location: United States (Remote). Restricted to US Persons only due to ITAR regulations.
Company
Global provider of mission-critical software solutions for various industries.
What you will do
- Own day-to-day operations, support, and high-stakes incident response for always-on production systems.
- Drive post-incident reviews, enforce runbooks, monitor SLIs/SLOs, and optimize on-call rotations.
- Manage observability, telemetry, alerting with tools like Coralogix and FireHydrant, and build real-time dashboards.
- Champion automation, GitOps practices, Terraform infrastructure, and rigorous change reviews.
- Lead FinOps, capacity planning, cost optimization, and trade-offs for performance and reliability.
- Mentor SRE team, escalate issues, collaborate cross-functionally, and manage workflows in Jira.
Requirements
- US Person status required due to ITAR restrictions on technical data access.
- Deep hands-on experience in production operations, SRE, DevOps, or Infrastructure in hybrid cloud/on-prem.
- Expertise in incident management, on-call best practices, and operational processes.
- Proficiency with GitOps, Terraform, and observability tools.
- Strong communication for leading incidents and cross-team coordination.
Nice to have
- Bachelor’s degree in Computer Science, Engineering, or equivalent experience.
- 5+ years in SRE, DevOps, Infrastructure, or Production Operations.
- Cloud certifications (AWS, Azure, Google Cloud).
- Experience in Agile/Scrum and Jira.
- Background supporting high-availability SaaS platforms.
Culture & Benefits
- Hands-on technical leadership in mission-critical systems with cutting-edge SRE and automation tech.
- Collaboration with global engineering and product teams.
- Competitive compensation and comprehensive benefits.
- Exciting growth opportunities in a fast-paced environment.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →