Назад
Company hidden
2 дня назад

SRE Leader (AWS)

Формат работы
hybrid
Тип работы
fulltime
Грейд
lead
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

SRE Leader (AWS/Kubernetes): Owning the reliability, performance, and automation of a cloud-based real-time healthcare platform with an accent on uptime, observability, and self-healing systems. Focus on scaling high-traffic infrastructure, implementing SLIs/SLOs, and ensuring compliance with HIPAA and SOC 2 standards.

Location: Hybrid (New York City)

Company

hirify.global is building an AI-powered platform that automates and orchestrates clinical workflows to improve efficiency in hospitals.

What you will do

  • Ensure 99.99% uptime across the cloud platform, meeting strict SLAs for healthcare customers.
  • Architect and manage scalable AWS infrastructure and optimize Kubernetes and Docker environments.
  • Lead the adoption of Terraform for Infrastructure as Code (IaC) to fully automate management.
  • Build and refine monitoring, alerting, and logging systems using Prometheus, Grafana, OpenTelemetry, and Datadog.
  • Lead incident response, on-call operations, and conduct blameless postmortems to improve resilience.
  • Partner with Security & Compliance teams to ensure infrastructure meets HIPAA and SOC 2 standards.

Requirements

  • 10+ years of experience in Site Reliability Engineering or Cloud Infrastructure.
  • Deep expertise in AWS, Kubernetes, and distributed systems.
  • Strong background in observability tools (Prometheus, OpenTelemetry, or similar).
  • Proven experience with CI/CD automation, GitOps, and Terraform.
  • Mature leadership approach with the ability to drive technical strategy and mentor engineers.
  • Strong understanding of network security and compliance frameworks like HIPAA and SOC 2.

Nice to have

  • Experience with healthcare IT, including EHR data, FHIR, and HL7 interoperability.
  • Expertise in real-time distributed systems or large-scale data pipelines.
  • Prior experience leading major incident management processes.

Culture & Benefits

  • Opportunity to own mission-critical reliability for a healthcare platform with 99.99% uptime.
  • Work with cutting-edge AI-powered infrastructure and self-healing cloud systems.
  • Automation-first culture focused on minimizing manual operations.
  • Collaborate with a high-performing team of AI experts and healthcare innovators.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →