Senior Site Reliability Engineer (Azure)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior Site Reliability Engineer (Azure): Championing the reliability, availability, and performance of an Azure-based healthcare platform with an accent on SRE practices, observability frameworks, and incident management. Focus on reducing operational toil through automation, optimizing system performance, and ensuring compliance with HIPAA and SOC 2 requirements.
Location: Hybrid - at least 3 days/week in one of the offices: Dallas, TX; Chicago/Evanston; New York; or Washington, DC
Company
A specialty care platform connecting patients with a Network of Excellence of top specialists for surgery, cancer care, and infusions.
What you will do
- Define and track SLOs, SLIs, and error budgets for critical healthcare services.
- Build and maintain observability platforms using Datadog and Azure Monitor.
- Lead incident management processes via Rootly, including on-call rotations and post-incident reviews.
- Automate operational toil using Terraform and custom tooling.
- Design and implement disaster recovery and business continuity strategies.
- Optimize Azure infrastructure for performance, capacity, and cost efficiency.
Requirements
- 4+ years of experience in SRE, DevOps, or production operations roles.
- 3+ years of experience with Microsoft Azure.
- Strong proficiency with observability tools (Datadog, Azure Monitor, Prometheus, etc.).
- Hands-on experience with Terraform and CI/CD tools (Azure DevOps, GitHub Actions).
- Strong scripting skills in Python, Bash, or PowerShell.
- Must be able to be on-call 1 week per month.
Nice to have
- Experience with Azure Kubernetes Service (AKS) and containerized workloads.
- Deep expertise in chaos engineering and reliability testing.
- Relevant certifications in Azure, SRE, or Kubernetes.
- Experience working in regulated environments (HIPAA preferred).
Culture & Benefits
- Comprehensive insurance: Medical, Dental, and Vision.
- Financial security with 401k company match and Life Insurance.
- Work-life balance via Flexible Time Off and Paid Parental Leave.
- Culture rooted in logic, inclusion, grit, humanity, and truth.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →