Staff Software Engineer (AI Inference)

188 000 - 275 000$

Формат работы

hybrid

Тип работы

fulltime

Грейд

senior

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Staff Software Engineer (AI Inference): Building and operating Kubernetes-native inference platform powering low-latency, high-throughput AI workloads at massive scale with an accent on request routing, scheduling, GPU resource management, and system-wide optimizations. Focus on leading cross-team design initiatives, optimizing inference performance (latency, throughput, GPU utilization), and improving reliability across distributed systems.

Location: Hybrid prioritizing office in Sunnyvale, CA or Bellevue, WA; remote considered for candidates located more than 30 miles from an office, with onboarding at hubs and quarterly team gatherings. Must be a U.S. person (citizen, permanent resident, refugee, or asylee) or eligible to access export-controlled information without authorization.

Salary: $188,000–$275,000 base + bonus, equity, benefits

Company

hirify.global is the essential cloud for AI, a publicly traded (Nasdaq: CRWV) platform delivering infrastructure for AI labs, startups, and enterprises.

What you will do

Lead cross-team architecture and design initiatives for inference services.
Optimize inference performance including latency, throughput, and GPU utilization.
Improve system reliability at scale using metrics-driven approaches.
Design and operate distributed systems with Kubernetes orchestration and scheduling.
Implement batching, micro-batching, caching, and memory optimizations for inference.
Influence engineering direction across multiple teams and services.

Requirements

8–12+ years building/operating large-scale distributed systems or cloud platforms.
Proven leading cross-team technical initiatives.
Strong programming in Go, Python, or C++.
Deep Kubernetes expertise at production scale (orchestration, scheduling, service design).
Distributed systems, networking, performance optimization.
Experience with low-latency/high-throughput systems, strict P95/P99 latency.
Hands-on with inference systems (batching, caching, memory optimization).

Nice to have

Inference frameworks: vLLM, Triton, TensorRT-LLM, Ray Serve, TorchServe.
GPU optimization: CUDA, NCCL, RDMA, NUMA, interconnects.
Mixed precision (BF16, FP8), streaming inference.
Large-scale AI/ML infrastructure or hyperscale clouds.

Culture & Benefits

Hybrid workplace with flexible PTO, catered lunches in offices/data centers.
Comprehensive benefits: 100% paid medical/dental/vision, 401(k) match, HSA/FSA, life/disability insurance.
Family support: paid parental leave, childcare, family-forming via Carrot, mental wellness.
Professional growth: tuition reimbursement, ESPP, casual innovative culture.
Core values: curiosity, ownership, empowerment, client focus, collaboration.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →