Назад
Company hidden
20 часов назад

Site Reliability Engineering Manager (AI)

185 000 - 215 000$
Формат работы
hybrid
Тип работы
fulltime
Грейд
lead
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Site Reliability Engineering Manager (AWS/Kubernetes): Leading the SRE Operations team to ensure the reliability and scalability of cloud infrastructure with an accent on transitioning from reactive NOC-style operations to proactive engineering practices. Focus on implementing SLOs, driving IaC standards via Terraform, and developing an AI strategy for infrastructure automation.

Location: Boston or New York

Salary: $185,000 - $215,000

Company

hirify.global is a leading public safety AI company that provides mission-critical intelligence to first responders and security teams to enable faster emergency responses.

What you will do

  • Own the reliability, scalability, and operational health of Kubernetes clusters, shared services, and core AWS infrastructure.
  • Drive the IaC foundation using Terraform and Atlantis to establish core engineering standards.
  • Partner with Engineering Managers to define SLOs and error budgets, shifting operational ownership to product teams.
  • Lead the Tier 1 on-call rotation and incident command for Sev-1 incidents, ensuring smooth escalation and resolution.
  • Mentor engineers and manage team growth, including headcount planning and professional development.
  • Shape the long-term AI strategy for infrastructure by identifying automation opportunities and operationalizing AI tooling.

Requirements

  • 7+ years of experience in SRE, platform engineering, or DevOps.
  • 2+ years of experience in a leadership role responsible for a team.
  • Direct experience managing production Kubernetes and AWS infrastructure in high-availability environments.
  • Ability to write and review production-quality Python scripts and tooling.
  • Hands-on proficiency with Terraform, Helm, ArgoCD, Datadog, and RabbitMQ.
  • Practical experience implementing SLOs, error budgets, and blameless postmortems.

Culture & Benefits

  • Opportunity to work on a mission-driven product that directly impacts life-saving emergency responses.
  • Competitive salary, comprehensive benefits, and equity participation.
  • Dynamic, flexible, and fast-paced startup work environment with a highly talented team.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →