Staff Engineer (Kubernetes/AI)

314 000 - 465 000$

Формат работы

hybrid

Тип работы

fulltime

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Staff Engineer (Kubernetes/AI): Building and scaling a managed Kubernetes platform purpose-built for AI workloads with an accent on bare-metal orchestration, GPU-aware scheduling, and high-performance networking. Focus on designing holistic infrastructure solutions that integrate compute, storage, and security to power next-generation AI training and inference at scale.

Location: Must be based in or able to commute to Bellevue, WA, San Francisco, CA, or San Jose, CA (4 days per week in-office)

Salary: $314,000 – $465,000

Company

hirify.global is a leader in AI cloud infrastructure, providing high-performance GPU compute to researchers and enterprises to make superintelligence ubiquitous.

What you will do

Drive the technical vision for a managed Kubernetes bare-metal platform, focusing on scalability, multi-tenancy, and lifecycle management.
Integrate and extend NVIDIA's open-source ecosystem, including GPU Operator, DCGM, and topology-aware scheduling.
Design and build higher-level platform services for inference, including autoscaling and multi-model deployment patterns.
Collaborate across infrastructure teams to define networking (RDMA, InfiniBand) and storage requirements for AI workloads.
Lead technical design sessions, mentor engineers, and establish best practices for distributed systems and Cloud Native engineering.
Build self-healing systems and automation for incident response and platform resilience at scale.

Requirements

10+ years of experience in software, platform, or SRE, with 5+ years focused on Kubernetes at scale.
Expert-level understanding of Kubernetes internals (API machinery, controllers, CNI, CSI).
Strong software engineering skills in Go (required) and Python.
Deep experience with GPU orchestration (NVIDIA GPU Operator, DCGM, MIG).
Holistic infrastructure expertise spanning compute, networking, storage, and security.
Must be able to work from the Bellevue, San Francisco, or San Jose office 4 days per week.

Nice to have

Experience building managed Kubernetes services (GKE, EKS, AKS).
Familiarity with HPC job schedulers like Slurm.
Contributions to CNCF projects or NVIDIA open-source projects.
Background in ML infrastructure, training clusters, or inference serving.

Culture & Benefits

Generous cash and equity compensation packages.
Comprehensive health, dental, and vision coverage for employees and dependents.
401k plan with 2% company match.
Flexible paid time off policy.
Opportunity to work with cutting-edge AI infrastructure and NVIDIA's latest technology.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →