Назад
Company hidden
19 часов назад

Principal Deployment Engineer (AI Infrastructure)

Формат работы
onsite
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Principal Deployment Engineer (AI Infrastructure): Leading hands-on bringup of GPU clusters in data center environments with an accent on hardware integration, high-speed networking, and performance validation. Focus on building repeatable deployment processes, troubleshooting complex distributed systems, and ensuring production readiness for large-scale AI workloads.

Location: Must be based in the United States (Travel Required)

Company

hirify.global is a startup building next-generation AI infrastructure, delivering performant and scalable GPU clusters for frontier AI training and inference.

What you will do

  • Execute end-to-end bringup of GPU nodes and racks from installation to production readiness.
  • Validate BIOS, BMC, firmware configurations, and GPU health.
  • Configure and validate high-speed network fabrics including InfiniBand and RoCE.
  • Perform cluster-wide burn-in, stress testing, and performance validation using NCCL and RDMA.
  • Contribute to automation for provisioning and improve deployment playbooks.
  • Coordinate with hardware vendors and cross-functional teams to resolve bringup issues.

Requirements

  • Must be based in the United States and comfortable with travel.
  • 7–8+ years in infrastructure engineering, hardware deployment, or data center operations.
  • Hands-on experience deploying GPU servers such as HGX or DGX platforms.
  • Strong knowledge of high-speed networking fabrics and Linux systems.
  • Experience troubleshooting distributed systems performance issues.

Nice to have

  • Experience in AI/ML infrastructure or HPC environments.
  • Familiarity with NCCL, CUDA, and RDMA.
  • Automation skills using Python, Ansible, Terraform, or Bash.
  • Experience in high-density power and cooling environments.

Culture & Benefits

  • Opportunity to build foundational AI infrastructure from zero to scale.
  • Fast-paced startup environment with a bias toward action and ownership.
  • Direct impact on the foundational technology powering frontier AI workloads.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →