Эта вакансия в архиве

Посмотреть похожие вакансии ↓
Company hidden
обновлено 2 месяца назад

Senior Director Fleet Reliability Operations

212 000 - 311 000$
Формат работы
hybrid
Тип работы
fulltime
Грейд
director
Английский
b2
Страна
US

Описание вакансии

Текст:
/

TL;DR

Senior Director Fleet Reliability Operations (System Engineering): Lead the evolution and management of a global GPU server fleet with an accent on automation, resilience, and scale. Focus on architecting scalable, reliable, and automated infrastructure systems for supercomputing clusters and leading a high-performing global operations team.

Location: Hybrid in Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA, USA

Salary: $212,000–$311,000

Company

hirify.global is a publicly traded cloud infrastructure company specializing in AI-focused supercomputing platforms, delivering high-performance GPU server fleets for AI labs, startups, and enterprises.

What you will do

  • Lead and grow a global management team for fleet reliability operations.
  • Develop and drive the Fleet Operations roadmap prioritizing automation, resilience, and scale.
  • Collaborate cross-functionally with hardware, platform, network, data center, and vendor teams.
  • Champion operational excellence, metrics, and blameless incident response.
  • Drive an automation-first strategy to reduce toil and increase innovation.
  • Cultivate a culture of reliability, mentorship, and continuous improvement.

Requirements

  • Must be a U.S. person or eligible to access export controlled information per U.S. Government regulations.
  • 10+ years experience in infrastructure, platform engineering, SRE, or DevOps.
  • 5+ years leadership managing mission-critical global production environments.
  • Deep technical knowledge of data center operations, fleet provisioning, lifecycle management, and observability tooling.
  • Strong automation, monitoring, and scalable fleet management skills.
  • Effective communicator and collaborator across complex cross-functional teams.

Nice to have

  • Experience managing global GPU or HPC dense compute infrastructure fleets.
  • Background in architecture or development of infrastructure management platforms and workflows.
  • Prior roles owning uptime, incident response, or reliability engineering in hyperscale environments.

Culture & Benefits

  • Comprehensive medical, dental, vision insurance fully paid by employer.
  • Company-paid life insurance and disability coverage.
  • Flexible spending and health savings accounts.
  • Tuition reimbursement and employee stock purchase program participation.
  • Paid parental leave, flexible PTO, and childcare support.
  • 401(k) with employer match and casual, innovative work environment.

Hiring process

  • Onboarding at one of the company hubs within the first month.
  • Quarterly team gatherings to support collaboration.
  • Reasonable accommodations provided for candidates with disabilities upon request.