Site Reliability Engineering Manager (AI)

185 000 - 215 000$

Формат работы

hybrid

Тип работы

fulltime

Грейд

lead

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Site Reliability Engineering Manager (AWS/Kubernetes): Leading the SRE Operations team to ensure the reliability and scalability of cloud infrastructure with an accent on transitioning from reactive NOC-style operations to proactive engineering practices. Focus on implementing SLOs, driving IaC standards via Terraform, and developing an AI strategy for infrastructure automation.

Location: Boston or New York

Salary: $185,000 - $215,000

Company

hirify.global is a leading public safety AI company that provides mission-critical intelligence to first responders and security teams to enable faster emergency responses.

What you will do

Own the reliability, scalability, and operational health of Kubernetes clusters, shared services, and core AWS infrastructure.
Drive the IaC foundation using Terraform and Atlantis to establish core engineering standards.
Partner with Engineering Managers to define SLOs and error budgets, shifting operational ownership to product teams.
Lead the Tier 1 on-call rotation and incident command for Sev-1 incidents, ensuring smooth escalation and resolution.
Mentor engineers and manage team growth, including headcount planning and professional development.
Shape the long-term AI strategy for infrastructure by identifying automation opportunities and operationalizing AI tooling.

Requirements

7+ years of experience in SRE, platform engineering, or DevOps.
2+ years of experience in a leadership role responsible for a team.
Direct experience managing production Kubernetes and AWS infrastructure in high-availability environments.
Ability to write and review production-quality Python scripts and tooling.
Hands-on proficiency with Terraform, Helm, ArgoCD, Datadog, and RabbitMQ.
Practical experience implementing SLOs, error budgets, and blameless postmortems.

Culture & Benefits

Opportunity to work on a mission-driven product that directly impacts life-saving emergency responses.
Competitive salary, comprehensive benefits, and equity participation.
Dynamic, flexible, and fast-paced startup work environment with a highly talented team.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →