Назад
Company hidden
1 день назад

Staff Software Engineer (AI Inference)

188 000 - 275 000$
Формат работы
hybrid
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Staff Software Engineer (AI Inference): Building and operating Kubernetes-native inference platform powering low-latency, high-throughput AI workloads at massive scale with an accent on request routing, scheduling, GPU resource management, and system-wide optimizations. Focus on leading cross-team design initiatives, optimizing inference performance (latency, throughput, GPU utilization), and improving reliability across distributed systems.

Location: Hybrid prioritizing office in Sunnyvale, CA or Bellevue, WA; remote considered for candidates located more than 30 miles from an office, with onboarding at hubs and quarterly team gatherings. Must be a U.S. person (citizen, permanent resident, refugee, or asylee) or eligible to access export-controlled information without authorization.

Salary: $188,000–$275,000 base + bonus, equity, benefits

Company

hirify.global is the essential cloud for AI, a publicly traded (Nasdaq: CRWV) platform delivering infrastructure for AI labs, startups, and enterprises.

What you will do

  • Lead cross-team architecture and design initiatives for inference services.
  • Optimize inference performance including latency, throughput, and GPU utilization.
  • Improve system reliability at scale using metrics-driven approaches.
  • Design and operate distributed systems with Kubernetes orchestration and scheduling.
  • Implement batching, micro-batching, caching, and memory optimizations for inference.
  • Influence engineering direction across multiple teams and services.

Requirements

  • 8–12+ years building/operating large-scale distributed systems or cloud platforms.
  • Proven leading cross-team technical initiatives.
  • Strong programming in Go, Python, or C++.
  • Deep Kubernetes expertise at production scale (orchestration, scheduling, service design).
  • Distributed systems, networking, performance optimization.
  • Experience with low-latency/high-throughput systems, strict P95/P99 latency.
  • Hands-on with inference systems (batching, caching, memory optimization).

Nice to have

  • Inference frameworks: vLLM, Triton, TensorRT-LLM, Ray Serve, TorchServe.
  • GPU optimization: CUDA, NCCL, RDMA, NUMA, interconnects.
  • Mixed precision (BF16, FP8), streaming inference.
  • Large-scale AI/ML infrastructure or hyperscale clouds.

Culture & Benefits

  • Hybrid workplace with flexible PTO, catered lunches in offices/data centers.
  • Comprehensive benefits: 100% paid medical/dental/vision, 401(k) match, HSA/FSA, life/disability insurance.
  • Family support: paid parental leave, childcare, family-forming via Carrot, mental wellness.
  • Professional growth: tuition reimbursement, ESPP, casual innovative culture.
  • Core values: curiosity, ownership, empowerment, client focus, collaboration.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →