Senior GPU Infrastructure Engineer (AI)

Формат работы

onsite

Тип работы

fulltime

Грейд

senior

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Senior GPU Infrastructure Engineer (AI Cloud): Building and scaling a GPU Cloud Marketplace through multi-tenancy provisioning and virtualization solutions with an accent on bare-metal lifecycle management, GPU scheduling, and orchestration. Focus on transforming raw GPUs from global suppliers into programmable pools, optimizing for AI/ML workloads, and delivering cost savings via efficient resource utilization.

Location: San Francisco, CA (onsite, full-time)

Company

hirify.global Labs is democratizing AI by aggregating global GPU resources into an open-access cloud marketplace and inference service for developers and researchers.

What you will do

Build core orchestration layer for multi-tenancy GPU provisioning and virtualization from diverse suppliers.
Implement bare-metal provisioning workflows using IPMI/Redfish, PXE boot, and automated OS deployment.
Develop GPU scheduling with awareness of types, memory, topology, and placement strategies to minimize fragmentation.
Design storage infrastructure for AI/ML including object storage, high-IOPS block, and distributed file systems.
Integrate API design, cloud-init, and observability for scalable production environments.
Collaborate with hardware vendors to troubleshoot and optimize integrations.

Requirements

Deep expertise in bare-metal provisioning, IPMI/Redfish, BMC management, PXE, and OS deployment.
Strong GPU orchestration knowledge: scheduling, memory management, multi-GPU jobs, topology.
Proficiency in Terraform/Pulumi, CI/CD, secrets management, configuration, observability.
Experience with AI/ML storage: object, block, distributed file systems.
Skills in API design, cloud-init, GPU architecture, CUDA optimization.
Proven scaling of cloud/distributed systems; collaboration with vendors and stakeholders.

Nice to have

Familiarity with InfiniBand, RoCE high-performance networking.
Experience with Ceph, Weka, VAST Data distributed storage.

Culture & Benefits

Work with co-founders holding PhDs in AI, Math, Computer Science.
Prepare for growth post-Series A in a forward-thinking, inclusive environment.
Equal opportunity employer committed to diversity and inclusion.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →