2 дня назад
SRE Leader (AWS)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
Текст:
TL;DR
SRE Leader (AWS/Kubernetes): Owning the reliability, performance, and automation of a cloud-based real-time healthcare platform with an accent on uptime, observability, and self-healing systems. Focus on scaling high-traffic infrastructure, implementing SLIs/SLOs, and ensuring compliance with HIPAA and SOC 2 standards.
Location: Hybrid (New York City)
Company
is building an AI-powered platform that automates and orchestrates clinical workflows to improve efficiency in hospitals.
What you will do
- Ensure 99.99% uptime across the cloud platform, meeting strict SLAs for healthcare customers.
- Architect and manage scalable AWS infrastructure and optimize Kubernetes and Docker environments.
- Lead the adoption of Terraform for Infrastructure as Code (IaC) to fully automate management.
- Build and refine monitoring, alerting, and logging systems using Prometheus, Grafana, OpenTelemetry, and Datadog.
- Lead incident response, on-call operations, and conduct blameless postmortems to improve resilience.
- Partner with Security & Compliance teams to ensure infrastructure meets HIPAA and SOC 2 standards.
Requirements
- 10+ years of experience in Site Reliability Engineering or Cloud Infrastructure.
- Deep expertise in AWS, Kubernetes, and distributed systems.
- Strong background in observability tools (Prometheus, OpenTelemetry, or similar).
- Proven experience with CI/CD automation, GitOps, and Terraform.
- Mature leadership approach with the ability to drive technical strategy and mentor engineers.
- Strong understanding of network security and compliance frameworks like HIPAA and SOC 2.
Nice to have
- Experience with healthcare IT, including EHR data, FHIR, and HL7 interoperability.
- Expertise in real-time distributed systems or large-scale data pipelines.
- Prior experience leading major incident management processes.
Culture & Benefits
- Opportunity to own mission-critical reliability for a healthcare platform with 99.99% uptime.
- Work with cutting-edge AI-powered infrastructure and self-healing cloud systems.
- Automation-first culture focused on minimizing manual operations.
- Collaborate with a high-performing team of AI experts and healthcare innovators.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →
Похожие вакансии
6 дней назад
Staff SRE (Platform Engineering)
150 000 - 210 000$
6 дней назад
Senior Site Reliability Engineer (AI)
3 дня назад
Sr. DevOps Engineer (Azure)
176 362 - 293 937$
5 дней назад
DevOps/SRE Engineer (Fintech)
5 дней назад
Senior Manager, Site Reliability Engineering
187 000 - 243 000$
CrowdStrike
5 дней назад
SRE/DevOps Engineer (Cybersecurity)
120 000 - 180 000$