Senior GPU Infrastructure Engineer (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior GPU Infrastructure Engineer (AI Cloud): Building and scaling a GPU Cloud Marketplace through multi-tenancy provisioning and virtualization solutions with an accent on bare-metal lifecycle management, GPU scheduling, and orchestration. Focus on transforming raw GPUs from global suppliers into programmable pools, optimizing for AI/ML workloads, and delivering cost savings via efficient resource utilization.
Location: San Francisco, CA (onsite, full-time)
Company
Labs is democratizing AI by aggregating global GPU resources into an open-access cloud marketplace and inference service for developers and researchers.
What you will do
- Build core orchestration layer for multi-tenancy GPU provisioning and virtualization from diverse suppliers.
- Implement bare-metal provisioning workflows using IPMI/Redfish, PXE boot, and automated OS deployment.
- Develop GPU scheduling with awareness of types, memory, topology, and placement strategies to minimize fragmentation.
- Design storage infrastructure for AI/ML including object storage, high-IOPS block, and distributed file systems.
- Integrate API design, cloud-init, and observability for scalable production environments.
- Collaborate with hardware vendors to troubleshoot and optimize integrations.
Requirements
- Deep expertise in bare-metal provisioning, IPMI/Redfish, BMC management, PXE, and OS deployment.
- Strong GPU orchestration knowledge: scheduling, memory management, multi-GPU jobs, topology.
- Proficiency in Terraform/Pulumi, CI/CD, secrets management, configuration, observability.
- Experience with AI/ML storage: object, block, distributed file systems.
- Skills in API design, cloud-init, GPU architecture, CUDA optimization.
- Proven scaling of cloud/distributed systems; collaboration with vendors and stakeholders.
Nice to have
- Familiarity with InfiniBand, RoCE high-performance networking.
- Experience with Ceph, Weka, VAST Data distributed storage.
Culture & Benefits
- Work with co-founders holding PhDs in AI, Math, Computer Science.
- Prepare for growth post-Series A in a forward-thinking, inclusive environment.
- Equal opportunity employer committed to diversity and inclusion.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →