Назад
Company hidden
1 день назад

Staff Engineer (Kubernetes/AI)

314 000 - 465 000$
Формат работы
hybrid
Тип работы
fulltime
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Staff Engineer (Kubernetes/AI): Building and scaling a managed Kubernetes platform purpose-built for AI workloads with an accent on bare-metal orchestration, GPU-aware scheduling, and high-performance networking. Focus on designing holistic infrastructure solutions that integrate compute, storage, and security to power next-generation AI training and inference at scale.

Location: Must be based in or able to commute to Bellevue, WA, San Francisco, CA, or San Jose, CA (4 days per week in-office)

Salary: $314,000 – $465,000

Company

hirify.global is a leader in AI cloud infrastructure, providing high-performance GPU compute to researchers and enterprises to make superintelligence ubiquitous.

What you will do

  • Drive the technical vision for a managed Kubernetes bare-metal platform, focusing on scalability, multi-tenancy, and lifecycle management.
  • Integrate and extend NVIDIA's open-source ecosystem, including GPU Operator, DCGM, and topology-aware scheduling.
  • Design and build higher-level platform services for inference, including autoscaling and multi-model deployment patterns.
  • Collaborate across infrastructure teams to define networking (RDMA, InfiniBand) and storage requirements for AI workloads.
  • Lead technical design sessions, mentor engineers, and establish best practices for distributed systems and Cloud Native engineering.
  • Build self-healing systems and automation for incident response and platform resilience at scale.

Requirements

  • 10+ years of experience in software, platform, or SRE, with 5+ years focused on Kubernetes at scale.
  • Expert-level understanding of Kubernetes internals (API machinery, controllers, CNI, CSI).
  • Strong software engineering skills in Go (required) and Python.
  • Deep experience with GPU orchestration (NVIDIA GPU Operator, DCGM, MIG).
  • Holistic infrastructure expertise spanning compute, networking, storage, and security.
  • Must be able to work from the Bellevue, San Francisco, or San Jose office 4 days per week.

Nice to have

  • Experience building managed Kubernetes services (GKE, EKS, AKS).
  • Familiarity with HPC job schedulers like Slurm.
  • Contributions to CNCF projects or NVIDIA open-source projects.
  • Background in ML infrastructure, training clusters, or inference serving.

Culture & Benefits

  • Generous cash and equity compensation packages.
  • Comprehensive health, dental, and vision coverage for employees and dependents.
  • 401k plan with 2% company match.
  • Flexible paid time off policy.
  • Opportunity to work with cutting-edge AI infrastructure and NVIDIA's latest technology.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →