Назад
Company hidden
12 часов назад

Staff Software Engineer, Kubernetes Platform (AI)

320 000 - 405 000$
Формат работы
hybrid
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Staff Software Engineer (Kubernetes): Building and scaling the Kubernetes control plane to support massive fleets of nodes for training and serving frontier AI models with an accent on custom scheduler development and control plane scalability. Focus on designing topology-sensitive ML workload placement and optimizing core cluster services for extreme scale.

Location: Hybrid; must be based in San Francisco, New York City, or Seattle (minimum 25% office presence required)

Salary: $320,000 - $405,000 USD

Company

hirify.global is a public benefit corporation dedicated to creating reliable, interpretable, and steerable AI systems that are safe and beneficial for society.

What you will do

  • Own and extend the Kubernetes scheduler for accelerator fleets, implementing custom plugins for gang scheduling and topology awareness.
  • Scale the Kubernetes control plane (apiserver, etcd, controller-manager) to support clusters far beyond typical industry limits.
  • Design and operate core cluster services, such as service discovery, that every workload in the fleet depends on.
  • Build and maintain custom controllers, operators, and CRDs to enhance platform capabilities.
  • Partner with research, training, and inference teams to translate ML workload requirements into platform features.
  • Lead incident response, manage on-call rotations, and design reliability processes including SLOs and postmortems.

Requirements

  • Significant software engineering experience building and operating production distributed systems.
  • Proficiency in at least one systems-appropriate language (Go, Python, Rust, or C++).
  • Deep hands-on Kubernetes expertise, specifically regarding the scheduler, controllers, and apiserver.
  • Ability to debug complex issues across the stack, from API behavior to node and network-level root causes.
  • Must be based in the USA (SF, NYC, or Seattle) to comply with the hybrid office policy.
  • Bachelor’s degree or equivalent combination of education and experience.

Nice to have

  • Contributions to Kubernetes internals (kube-scheduler, etcd, client-go, etc.).
  • Experience with batch systems such as Slurm, Volcano, or Kueue.
  • Familiarity with ML infrastructure, including GPUs, TPUs, and collective networking (NCCL).
  • Low-level systems experience with Linux kernel tuning, cgroups, or eBPF.
  • 8+ years of industry experience leading large, ambiguous infrastructure projects.

Culture & Benefits

  • Collaborative "big science" research environment focused on high-impact AI safety goals.
  • Competitive compensation with optional equity donation matching.
  • Generous vacation and parental leave policies.
  • Flexible working hours and modern collaborative office spaces.
  • Visa sponsorship support available for qualifying candidates.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →