2 месяца назад

Staff Software Engineer (AI Infrastructure)

320 000 - 405 000$

Формат работы

hybrid

Тип работы

fulltime

Грейд

senior

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Staff Software Engineer (AI Infrastructure): Developing and managing the full lifecycle of accelerator capacity and compute clusters to power frontier AI research with an accent on node ingestion, automated repair, and fleet scalability. Focus on designing systems to detect and remediate unhealthy hardware and scaling clusters across multiple clouds and accelerator families.

Location: Hybrid (Must be based in San Francisco, New York City, or Seattle)

Salary: $320,000 - $405,000 USD

Company

Anthropic is a public benefit corporation dedicated to creating reliable, interpretable, and steerable AI systems.

What you will do

Own the technical strategy and roadmap for node lifecycle management, including ingestion, bring-up, and automated repair.
Drive cross-team initiatives to build and scale AI clusters across multiple clouds and accelerator families (GPUs, TPUs, Trainium).
Design and operate automated systems to detect and remediate unhealthy hardware, maximizing fleet usability.
Define infrastructure architecture and solve critical technical challenges directly or through team leadership.
Collaborate with cloud providers and internal research/product teams to shape long-term compute and data strategy.
Establish operational excellence practices and provide technical mentorship to growth engineers.

Requirements

Deep expertise in distributed systems, reliability, and cloud platforms (Kubernetes, AWS, GCP, Azure).
Strong proficiency in Rust, Go, or Python, and experience with Terraform.
Hands-on experience with machine learning accelerators (GPUs, TPUs, or Trainium).
Proven track record of leading complex, multi-quarter technical initiatives spanning multiple teams.
Must be based in San Francisco, New York City, or Seattle (hybrid policy: minimum 25% office presence).
Bachelor's degree or equivalent professional experience in a relevant field.

Nice to have

8+ years of software engineering experience, including technical leadership.
Experience managing hyperscale compute infrastructure (10K+ nodes).
Depth in Kubernetes internals (scheduler, autoscaler, Karpenter) or similar cluster orchestration systems.
Low-level systems experience with kernels, virtualization, device drivers, or firmware.
Familiarity with high-performance networking (EFA, RDMA, InfiniBand) for ML workloads.

Culture & Benefits

Collaborative "big science" approach to AI research focusing on long-term safety and steerability.
Competitive compensation with optional equity donation matching.
Generous vacation, parental leave, and flexible working hours.
Visa sponsorship available for eligible candidates.
Modern office spaces in major US hubs to facilitate collaboration.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →