Principal Technical Program Manager (AI Infrastructure)

Формат работы

remote (Global)

Тип работы

fulltime

Грейд

senior

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Principal Technical Program Manager (AI Infrastructure): Driving the operational stability and scaling of high-performance GPU fleets and InfiniBand networks with an accent on availability and uptime metrics. Focus on optimizing operational workflows, leading large-scale infrastructure build-outs, and bridging engineering execution with strategic business goals.

Location: Remote (Global) — Geography is no barrier to impact or connection

Company

hirify.global is a GPU cloud engineered for AI, providing cost-effective, high-performance infrastructure for AI startups and large enterprise customers.

What you will do

Lead strategic operational programs, including new data center AI infrastructure build-outs and large-scale fleet software/firmware rollouts.
Establish and drive accountability for critical infrastructure KPIs, specifically targeting 97.5% Availability and 99% Uptime.
Analyze and optimize operational workflows across Fleet Operations, Network Operations, and SRE to reduce toil and improve MTTR.
Act as the primary liaison between hardware, compute platform, and network engineering teams, as well as external GPU and network hardware vendors.
Translate capacity planning models into actionable infrastructure delivery and readiness roadmaps.
Identify and mitigate technical, schedule, and resource risks related to AI infrastructure scaling.

Requirements

5+ years of experience in Technical Program Management driving large-scale infrastructure or software engineering programs.
Strong foundational understanding of data center infrastructure, distributed systems, Linux, and networking concepts.
Proven expertise in modern program management methodologies (Agile, Scrum, PMP preferred).
Experience defining and improving system performance based on operational metrics (SLOs, SLIs, MTTR).
Ability to thrive in a fast-paced, high-growth environment and manage multiple priorities under ambiguity.

Nice to have

Direct experience with data center infrastructure build-outs and hardware commissioning.
Domain knowledge of AI/HPC infrastructure, including NVIDIA GPUs and InfiniBand/RDMA networks.
Experience in hyperscale or public cloud environments supporting 24/7 mission-critical services.
Familiarity with SRE principles, automation tooling, and CI/CD pipelines for infrastructure.
Bachelor's or Master's degree in Computer Science, Engineering, or a related technical field.

Culture & Benefits

Highly competitive package including base salary and equity with annual reviews.
Remote-first team culture with high autonomy and human-first flexibility.
Opportunity to join a fast-growing tech startup and work on cutting-edge AI infrastructure.
Dynamic progression plan tailored to individual ambitions and ownership.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →