Manager, Site Reliability Engineering (Azure)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Manager, Site Reliability Engineering (Azure/DevOps): Leading and growing the SRE team to ensure the reliability and performance of a high-scale healthcare workforce platform with an accent on AIOps and automation. Focus on designing reliability architectures, leading major incident responses, and operationalizing AI-assisted workflows to reduce MTTD and MTTR.
Location: Remote (Must be based in the US)
Salary: $230,000 – $255,000
Company
is a leading healthcare workforce solutions provider building tech-enabled marketplaces for healthcare talent.
What you will do
- Lead, mentor, and grow a team of high-performing Site Reliability Engineers, managing hiring and career development.
- Define and operationalize SLOs, SLIs, and error budgets to drive the platform reliability strategy.
- Build an AIOps practice using anomaly detection and automated triage to reduce MTTD and MTTR.
- Lead major incident response as senior incident commander for severity-1 events and implement systemic fixes.
- Partner with FinOps to optimize platform unit economics, right-sizing, and capacity efficiency.
- Ensure all reliability and infrastructure decisions uphold HIPAA, PHI, and SOC 2 compliance.
Requirements
- Must be based in the US.
- 10+ years of experience in SRE, DevOps, or Platform Engineering roles.
- 4+ years of direct people management experience, including running remote on-call teams.
- Deep production experience with Azure (AKS, networking, identity, and platform services).
- Production-grade fluency with modern observability tools like Datadog.
- Hands-on experience integrating AI/LLM-assisted tooling into operational workflows.
Nice to have
- Production experience with Cloudflare CDN, WAF, Workers, and ZTNA.
- Advanced IaC skills using Terragrunt and Terraform in multi-environment pipelines.
- Experience with GitHub Actions, OPA/Conftest, and DORA-metric instrumentation.
- Proficiency with ServiceNow for ITSM integration.
- Experience with Chaos engineering and resilience exercises.
Culture & Benefits
- Comprehensive premium medical, dental, life, and vision insurance.
- Generous 401(k) match.
- Unlimited Paid Time Off (DTO).
- Paid sick leave according to state and federal laws.
- Daily virtual wellness classes, including yoga and meditation.
- Company-sponsored virtual events and birthday celebrations.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →