Site Reliability Engineering Manager (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Site Reliability Engineering Manager (AWS/Kubernetes): Leading the SRE Operations team to ensure the reliability and scalability of cloud infrastructure with an accent on transitioning from reactive NOC-style operations to proactive engineering practices. Focus on implementing SLOs, driving IaC standards via Terraform, and developing an AI strategy for infrastructure automation.
Location: Boston or New York
Salary: $185,000 - $215,000
Company
is a leading public safety AI company that provides mission-critical intelligence to first responders and security teams to enable faster emergency responses.
What you will do
- Own the reliability, scalability, and operational health of Kubernetes clusters, shared services, and core AWS infrastructure.
- Drive the IaC foundation using Terraform and Atlantis to establish core engineering standards.
- Partner with Engineering Managers to define SLOs and error budgets, shifting operational ownership to product teams.
- Lead the Tier 1 on-call rotation and incident command for Sev-1 incidents, ensuring smooth escalation and resolution.
- Mentor engineers and manage team growth, including headcount planning and professional development.
- Shape the long-term AI strategy for infrastructure by identifying automation opportunities and operationalizing AI tooling.
Requirements
- 7+ years of experience in SRE, platform engineering, or DevOps.
- 2+ years of experience in a leadership role responsible for a team.
- Direct experience managing production Kubernetes and AWS infrastructure in high-availability environments.
- Ability to write and review production-quality Python scripts and tooling.
- Hands-on proficiency with Terraform, Helm, ArgoCD, Datadog, and RabbitMQ.
- Practical experience implementing SLOs, error budgets, and blameless postmortems.
Culture & Benefits
- Opportunity to work on a mission-driven product that directly impacts life-saving emergency responses.
- Competitive salary, comprehensive benefits, and equity participation.
- Dynamic, flexible, and fast-paced startup work environment with a highly talented team.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →