2 месяца назад

Staff Software Engineer, Kubernetes Platform (AI)

320 000 - 405 000$

Формат работы

hybrid

Тип работы

fulltime

Грейд

senior

Английский

Страна

UK/US

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Staff Software Engineer (Kubernetes): Building and scaling the Kubernetes control plane to support massive fleets of nodes for training and serving frontier AI models with an accent on custom scheduler development and control plane scalability. Focus on designing topology-sensitive ML workload placement and optimizing core cluster services for extreme scale.

Location: Hybrid; must be based in San Francisco, New York City, or Seattle (minimum 25% office presence required)

Salary: $320,000 - $405,000 USD

Company

Anthropic is a public benefit corporation dedicated to creating reliable, interpretable, and steerable AI systems that are safe and beneficial for society.

What you will do

Own and extend the Kubernetes scheduler for accelerator fleets, implementing custom plugins for gang scheduling and topology awareness.
Scale the Kubernetes control plane (apiserver, etcd, controller-manager) to support clusters far beyond typical industry limits.
Design and operate core cluster services, such as service discovery, that every workload in the fleet depends on.
Build and maintain custom controllers, operators, and CRDs to enhance platform capabilities.
Partner with research, training, and inference teams to translate ML workload requirements into platform features.
Lead incident response, manage on-call rotations, and design reliability processes including SLOs and postmortems.

Requirements

Significant software engineering experience building and operating production distributed systems.
Proficiency in at least one systems-appropriate language (Go, Python, Rust, or C++).
Deep hands-on Kubernetes expertise, specifically regarding the scheduler, controllers, and apiserver.
Ability to debug complex issues across the stack, from API behavior to node and network-level root causes.
Must be based in the USA (SF, NYC, or Seattle) to comply with the hybrid office policy.
Bachelor’s degree or equivalent combination of education and experience.

Nice to have

Contributions to Kubernetes internals (kube-scheduler, etcd, client-go, etc.).
Experience with batch systems such as Slurm, Volcano, or Kueue.
Familiarity with ML infrastructure, including GPUs, TPUs, and collective networking (NCCL).
Low-level systems experience with Linux kernel tuning, cgroups, or eBPF.
8+ years of industry experience leading large, ambiguous infrastructure projects.

Culture & Benefits

Collaborative "big science" research environment focused on high-impact AI safety goals.
Competitive compensation with optional equity donation matching.
Generous vacation and parental leave policies.
Flexible working hours and modern collaborative office spaces.
Visa sponsorship support available for qualifying candidates.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Staff Software Engineer, Kubernetes Platform (AI)

Anthropic

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Nice to have

Culture & Benefits

Похожие вакансии

Site Reliability Engineer (AI)

Staff Platform Engineer (DevOps)

Senior DevOps Engineer (AI)

Release Engineer (AI)

Senior Platform Engineer (AWS)

AI Platform Engineer (Kubernetes)

Разработка

Game Dev

Design и Creative

Аналитика

Менеджмент

People & Business

Staff Software Engineer, Kubernetes Platform (AI)

Anthropic

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Nice to have

Culture & Benefits

Categories

Похожие вакансии

Site Reliability Engineer (AI)

Staff Platform Engineer (DevOps)

Senior DevOps Engineer (AI)

Release Engineer (AI)

Senior Platform Engineer (AWS)

AI Platform Engineer (Kubernetes)