Назад
Company hidden
4 дня назад

Senior Software Engineer II (AI)

182 000 - 242 000$
Формат работы
onsite
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior Software Engineer II (AI): Building and scaling Kubernetes-native research cluster platforms and sandbox infrastructure for agentic training with an accent on distributed systems, workload orchestration, and ML infrastructure. Focus on designing high-performance tools that enable researchers to train models at scale without managing underlying infrastructure.

Location: Must be based in or able to work from Sunnyvale, CA or Bellevue, WA

Salary: $182,000 - $242,000

Company

hirify.global is a specialized cloud provider delivering high-performance infrastructure for AI, trusted by leading labs and enterprises to accelerate breakthroughs in machine learning.

What you will do

  • Design and build a complete research cluster experience including CLI, job configuration schemas, and Kubernetes operators.
  • Own the Python SDK for sandbox infrastructure to enable large-scale RL training and agent rollouts.
  • Collaborate directly with customers at large AI labs to understand their supercomputing stacks and translate needs into system designs.
  • Develop and maintain Kubernetes-native primitives for compute, storage, and networking.
  • Write technical documentation to help customers run popular OSS training frameworks on the platform.

Requirements

  • 8–12+ years of experience in distributed systems, ML infrastructure, or developer platforms.
  • Deep Kubernetes expertise including custom controllers, operators, scheduling, and CRDs at scale.
  • U.S. work authorization required due to export control compliance (U.S. citizen, permanent resident, or eligible for export authorization).
  • Proven track record of shipping production-grade infrastructure systems.
  • Strong communication skills for direct customer interaction and system design translation.
  • Understanding of distributed training workflows and researcher productivity bottlenecks.

Nice to have

  • Experience building internal ML platforms at large-scale training companies.
  • Familiarity with agentic AI, RL training, and sandbox isolation techniques.
  • Background with Slurm, Ray, or similar workload orchestration tools.
  • Experience with container runtimes like gVisor or Kata.
  • OSS contributions to Kubernetes SIGs, Ray, or PyTorch.

Culture & Benefits

  • Comprehensive medical, dental, and vision insurance (100% paid).
  • 401(k) with generous employer match.
  • Flexible PTO and casual work environment.
  • Support for family-forming and mental wellness.
  • Catered lunches in office and data center locations.
  • Opportunities for equity awards and employee stock purchase programs.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →