Назад
Company hidden
2 часа назад

Staff Infrastructure Engineer (Kubernetes)

Формат работы
onsite
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Staff Infrastructure Engineer (Kubernetes): Design and evolve Kubernetes control plane architecture for multi-tenant, multi-region AI compute platform with an accent on scalability, reliability, and operational ownership. Focus on multi-tenant cluster models, regional scaling strategies, networking integration, and production incident resolution.

Location: Las Vegas, Nevada. Must have authorization to work in the United States

Company

Cloud platform delivering seamless, secure AI compute at scale across multiple data centers.

What you will do

  • Design and evolve Kubernetes control plane architecture across regions, including multi-tenant models like vcluster or Kamaji.
  • Own platform reliability, on-call rotation, incident response, and lifecycle management of clusters.
  • Implement multi-region scaling, cluster topology, and failure domain strategies.
  • Design networking architectures, optimize CNI (Cilium), pod-to-pod traffic, and integrate with high-performance networking.
  • Enhance observability for control plane, cluster health, and lead root cause analysis.
  • Collaborate with DevOps, infrastructure, compute, storage, and networking teams.

Requirements

  • 7+ years in infrastructure, platform engineering, or distributed systems
  • Deep experience operating Kubernetes at scale in production across multiple clusters and regions
  • Strong Kubernetes internals knowledge: API server, scheduler, controller manager, etcd
  • Expertise in Linux systems, troubleshooting Kubernetes, container runtime, networking
  • Experience with CNI plugins (Cilium preferred), resource isolation, scheduling
  • Experience in CSP, hyperscale, or large-scale environments strongly preferred

Nice to have

  • Virtual cluster technologies (vcluster, Kamaji)
  • Supporting GPU workloads in Kubernetes
  • NUMA-aware scheduling, topology-aware workloads
  • RDMA and high-throughput networking
  • Observability platforms (Prometheus, Grafana)

Culture & Benefits

  • 100% paid medical, dental, vision insurance for employees
  • Company HSA contributions, 100% paid short/long-term disability
  • 401(k), flexible PTO, paid holidays, parental leave
  • Flexible spending account, employee assistance program
  • Supplementary benefits: pet/legal insurance, virtual healthcare
  • Stock options, in-office perks

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →