Назад
Company hidden
2 часа назад

Senior Deployment Engineer (AI)

Формат работы
onsite
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior Deployment Engineer (AI): Leading the hands-on bringup of high-performance GPU clusters in data center environments with an accent on hardware integration, high-speed fabric tuning, and performance validation. Focus on executing end-to-end node and rack deployments, troubleshooting complex distributed hardware issues, and building repeatable, scalable infrastructure processes.

Location: Must be based in the United States (Onsite travel required)

Company

A startup building next-generation AI infrastructure and scalable GPU clusters for frontier AI workloads.

What you will do

  • Execute end-to-end bringup of GPU nodes and racks from installation to production readiness.
  • Validate BIOS, BMC, firmware configurations, and overall GPU cluster health.
  • Configure and validate high-speed network fabrics including InfiniBand and RoCE.
  • Perform cluster-wide burn-in, stress testing, and performance validation using NCCL and RDMA.
  • Develop automation playbooks to transform ad-hoc deployments into repeatable, scalable systems.
  • Collaborate with networking and hardware vendors to troubleshoot and resolve deployment issues.

Requirements

  • Must have 5–8+ years of experience in infrastructure engineering or data center operations.
  • Hands-on experience deploying GPU servers such as HGX or DGX platforms.
  • Proficiency with high-speed networking fabrics including InfiniBand, RoCE, and Ethernet.
  • Strong Linux systems knowledge and troubleshooting skills for distributed performance issues.
  • Must be comfortable working onsite in data center environments.
  • Must be authorized to work in the United States.

Nice to have

  • Experience in AI/ML infrastructure or HPC environments.
  • Familiarity with CUDA, NCCL, and RDMA protocols.
  • Automation proficiency using Python, Ansible, Terraform, or Bash.
  • Experience managing high-density power and cooling data center environments.

Culture & Benefits

  • Opportunity to work on foundational AI infrastructure at a fast-growing startup.
  • High-impact role with significant ownership over infrastructure build-out.
  • Focus on urgency, bias toward action, and engineering excellence.
  • Direct collaboration with infrastructure and hardware teams.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник - загрузка...