Назад
Company hidden
24 часа Π½Π°Π·Π°Π΄

Senior Manager, Site Reliability Engineering (SRE)

143Β 000 - 191Β 000$
Π€ΠΎΡ€ΠΌΠ°Ρ‚ Ρ€Π°Π±ΠΎΡ‚Ρ‹
remote (Ρ‚ΠΎΠ»ΡŒΠΊΠΎ USA)
Π’ΠΈΠΏ Ρ€Π°Π±ΠΎΡ‚Ρ‹
fulltime
Π“Ρ€Π΅ΠΉΠ΄
senior/lead
Английский
b2
Π‘Ρ‚Ρ€Π°Π½Π°
US
Вакансия ΠΈΠ· списка Hirify.GlobalВакансия ΠΈΠ· Hirify Global, списка ΠΌΠ΅ΠΆΠ΄ΡƒΠ½Π°Ρ€ΠΎΠ΄Π½Ρ‹Ρ… tech-ΠΊΠΎΠΌΠΏΠ°Π½ΠΈΠΉ
Для мэтча ΠΈ ΠΎΡ‚ΠΊΠ»ΠΈΠΊΠ° Π½ΡƒΠΆΠ΅Π½ Plus

ΠœΡΡ‚Ρ‡ & Π‘ΠΎΠΏΡ€ΠΎΠ²ΠΎΠ΄

Для мэтча с этой вакансиСй Π½ΡƒΠΆΠ΅Π½ Plus

ОписаниС вакансии

ВСкст:
/

TL;DR

Senior Manager, Site Reliability Engineering (SRE): Leading the SRE organization to deliver reliable, scalable, and resilient platforms and services with an accent on owning strategy, implementation, and continuous improvement of a unified observability platform. Focus on driving practices around SLIs, SLOs, SLAs, Error Budgets, incident management, and automation while ensuring close collaboration across teams.

Location: Office Location or Remote - USA

Salary: $143,000 - $191,000 plus bonus

Company

hirify.global is a healthcare business and data automation company that empowers healthcare organizations to enable better patient care and maximize industry savings using its cloud-based supply chain technology exchange platform, solutions, analytics, and services.

What you will do

  • Hire, lead, and mentor a high-performing SRE team across geographies.
  • Define and execute the SRE vision, roadmap, and strategy in alignment with business and engineering objectives.
  • Build and manage a unified observability platform leveraging tools such as New Relic, Datadog, CloudWatch, Prometheus, Grafana, Graylog, and OpenTelemetry.
  • Define and manage SLIs, SLOs, SLAs, and Error Budgets across services.
  • Lead major incident response, coordinating communications with executives and stakeholders.
  • Collaborate with Engineering, Product, Security, Cloud, and DevOps teams to embed SRE practices.

Requirements

  • 12+ years of experience in SRE, Operations, or Infrastructure Engineering, with 5+ years in leadership roles.
  • Proven expertise in unified observability, monitoring, and alerting across infrastructure, applications, APM, and databases.
  • Strong knowledge of observability tools including New Relic, Datadog, Prometheus, Grafana, Graylog, CloudWatch, OpenTelemetry, and SolarWinds.
  • Hands-on experience with incident response, RCA, MTTR/MTTD reduction, and on-call management.
  • Deep understanding of SLIs, SLOs, SLAs, and Error Budgets.
  • Strong AWS experience (EC2, ECS, EKS, networking, scaling groups) and hands-on experience with Docker and Kubernetes.
  • Proficiency in Python, Java, C#, and shell scripting for automation.
  • Strong leadership, stakeholder management, and communication skills.

Nice to have

  • Experience in large-scale SaaS or product-driven environments.
  • Hands-on experience with databases: MongoDB, Elasticsearch, SQL Server, Oracle.
  • Experience with chaos engineering, resiliency testing, and disaster recovery planning.
  • Certifications: AWS Solutions Architect / DevOps Engineer, CKAD/CKA.
  • Experience managing global SRE teams across time zones.

Culture & Benefits

  • Establish a healthy 24x7 on-call model while promoting team well-being.
  • Drive a blameless culture through structured postmortems and RCA follow-up actions.
  • Health, vision, and dental insurance.
  • Accident and life insurance.
  • 401k matching.
  • Paid-time off and education reimbursement.

Π‘ΡƒΠ΄ΡŒΡ‚Π΅ остороТны: Ссли вас просят Π²ΠΎΠΉΡ‚ΠΈ Π² iCloud/Google, ΠΏΡ€ΠΈΡΠ»Π°Ρ‚ΡŒ ΠΊΠΎΠ΄/ΠΏΠ°Ρ€ΠΎΠ»ΡŒ, Π·Π°ΠΏΡƒΡΡ‚ΠΈΡ‚ΡŒ ΠΊΠΎΠ΄/ПО, Π½Π΅ Π΄Π΅Π»Π°ΠΉΡ‚Π΅ этого - это мошСнники. ΠžΠ±ΡΠ·Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎ ΠΆΠΌΠΈΡ‚Π΅ "ΠŸΠΎΠΆΠ°Π»ΠΎΠ²Π°Ρ‚ΡŒΡΡ" ΠΈΠ»ΠΈ ΠΏΠΈΡˆΠΈΡ‚Π΅ Π² ΠΏΠΎΠ΄Π΄Π΅Ρ€ΠΆΠΊΡƒ. ΠŸΠΎΠ΄Ρ€ΠΎΠ±Π½Π΅Π΅ Π² Π³Π°ΠΉΠ΄Π΅ β†’

ВСкст вакансии взят Π±Π΅Π· ΠΈΠ·ΠΌΠ΅Π½Π΅Π½ΠΈΠΉ

Π˜ΡΡ‚ΠΎΡ‡Π½ΠΈΠΊ - Π·Π°Π³Ρ€ΡƒΠ·ΠΊΠ°...