Назад
Company hidden
1 месяц назад

Senior GPU Infrastructure Engineer (AI)

Формат работы
onsite
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior GPU Infrastructure Engineer (AI Cloud): Building and scaling a GPU Cloud Marketplace through multi-tenancy provisioning and virtualization solutions with an accent on bare-metal lifecycle management, GPU scheduling, and orchestration. Focus on transforming raw GPUs from global suppliers into programmable pools, optimizing for AI/ML workloads, and delivering cost savings via efficient resource utilization.

Location: San Francisco, CA (onsite, full-time)

Company

hirify.global Labs is democratizing AI by aggregating global GPU resources into an open-access cloud marketplace and inference service for developers and researchers.

What you will do

  • Build core orchestration layer for multi-tenancy GPU provisioning and virtualization from diverse suppliers.
  • Implement bare-metal provisioning workflows using IPMI/Redfish, PXE boot, and automated OS deployment.
  • Develop GPU scheduling with awareness of types, memory, topology, and placement strategies to minimize fragmentation.
  • Design storage infrastructure for AI/ML including object storage, high-IOPS block, and distributed file systems.
  • Integrate API design, cloud-init, and observability for scalable production environments.
  • Collaborate with hardware vendors to troubleshoot and optimize integrations.

Requirements

  • Deep expertise in bare-metal provisioning, IPMI/Redfish, BMC management, PXE, and OS deployment.
  • Strong GPU orchestration knowledge: scheduling, memory management, multi-GPU jobs, topology.
  • Proficiency in Terraform/Pulumi, CI/CD, secrets management, configuration, observability.
  • Experience with AI/ML storage: object, block, distributed file systems.
  • Skills in API design, cloud-init, GPU architecture, CUDA optimization.
  • Proven scaling of cloud/distributed systems; collaboration with vendors and stakeholders.

Nice to have

  • Familiarity with InfiniBand, RoCE high-performance networking.
  • Experience with Ceph, Weka, VAST Data distributed storage.

Culture & Benefits

  • Work with co-founders holding PhDs in AI, Math, Computer Science.
  • Prepare for growth post-Series A in a forward-thinking, inclusive environment.
  • Equal opportunity employer committed to diversity and inclusion.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →