Назад
Company hidden
13 часов назад

Senior Manager, Production Engineering (AI)

207 000 - 275 000$
Формат работы
onsite
Тип работы
fulltime
Грейд
senior
Английский
c1
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior Manager, Production Engineering (AI): Leading and expanding the SRE team to ensure the reliability and performance of a large-scale AI cloud platform with an accent on operational excellence and automation. Focus on designing incident management processes, scaling distributed infrastructure, and implementing self-healing systems.

Location: Must be based in the US (Livingston, NJ; New York, NY; San Francisco, CA; Sunnyvale, CA; or Bellevue, WA). Must be a U.S. person (citizen, green card holder, etc.) for export control compliance.

Salary: $207,000 – $275,000

Company

hirify.global is the Essential Cloud for AI, providing high-performance infrastructure and technical expertise for AI labs, startups, and global enterprises.

What you will do

  • Execute the SRE vision and roadmap for large-scale, distributed cloud infrastructure.
  • Lead and mentor a high-performing team of SREs, promoting a culture of ownership and continuous learning.
  • Champion automation-first practices using AI, Terraform, Kubernetes, and Infrastructure-as-Code.
  • Establish and evolve Operational Excellence best practices to ensure platform proactivity.
  • Drive initiatives for incident management, root cause analysis, and system hardening.
  • Collaborate with engineering and product teams to build scalable, resilient, and self-healing systems.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or related field.
  • 10+ years in leadership or senior management at a cloud provider, hyperscaler, or high-growth tech company.
  • Experience managing geographically distributed 24x7 engineering teams.
  • Expertise in designing incident management processes (on-call rotations, SLO/SLA frameworks, postmortems).
  • Deep understanding of distributed systems, networking, and storage architecture.
  • Must be a U.S. person (U.S. citizen, national, lawful permanent resident, refugee, or asylee) to comply with export control regulations.

Nice to have

  • Experience with GPU-accelerated workloads, resource isolation, and performance tuning.
  • Prior leadership in bare metal infrastructure environments (custom data centers, HPC clusters).
  • Working knowledge of DPUs, service mesh architectures, and multi-tenant security models.
  • Experience in AI infrastructure supporting training or inference at scale.

Culture & Benefits

  • Comprehensive health, dental, and vision insurance (100% paid by the company).
  • 401(k) with a generous employer match and ESPP participation.
  • Equity awards and discretionary bonuses.
  • Flexible PTO and paid parental leave.
  • Daily catered lunch at office and data center locations.
  • Support for mental wellness and family-forming (Spring Health, Carrot).

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →