Staff Software Engineer (AI Inference)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Staff Software Engineer (AI Inference): Building and operating Kubernetes-native inference platform powering low-latency, high-throughput AI workloads at massive scale with an accent on request routing, scheduling, GPU resource management, and system-wide optimizations. Focus on leading cross-team design initiatives, optimizing inference performance (latency, throughput, GPU utilization), and improving reliability across distributed systems.
Location: Hybrid prioritizing office in Sunnyvale, CA or Bellevue, WA; remote considered for candidates located more than 30 miles from an office, with onboarding at hubs and quarterly team gatherings. Must be a U.S. person (citizen, permanent resident, refugee, or asylee) or eligible to access export-controlled information without authorization.
Salary: $188,000–$275,000 base + bonus, equity, benefits
Company
is the essential cloud for AI, a publicly traded (Nasdaq: CRWV) platform delivering infrastructure for AI labs, startups, and enterprises.
What you will do
- Lead cross-team architecture and design initiatives for inference services.
- Optimize inference performance including latency, throughput, and GPU utilization.
- Improve system reliability at scale using metrics-driven approaches.
- Design and operate distributed systems with Kubernetes orchestration and scheduling.
- Implement batching, micro-batching, caching, and memory optimizations for inference.
- Influence engineering direction across multiple teams and services.
Requirements
- 8–12+ years building/operating large-scale distributed systems or cloud platforms.
- Proven leading cross-team technical initiatives.
- Strong programming in Go, Python, or C++.
- Deep Kubernetes expertise at production scale (orchestration, scheduling, service design).
- Distributed systems, networking, performance optimization.
- Experience with low-latency/high-throughput systems, strict P95/P99 latency.
- Hands-on with inference systems (batching, caching, memory optimization).
Nice to have
- Inference frameworks: vLLM, Triton, TensorRT-LLM, Ray Serve, TorchServe.
- GPU optimization: CUDA, NCCL, RDMA, NUMA, interconnects.
- Mixed precision (BF16, FP8), streaming inference.
- Large-scale AI/ML infrastructure or hyperscale clouds.
Culture & Benefits
- Hybrid workplace with flexible PTO, catered lunches in offices/data centers.
- Comprehensive benefits: 100% paid medical/dental/vision, 401(k) match, HSA/FSA, life/disability insurance.
- Family support: paid parental leave, childcare, family-forming via Carrot, mental wellness.
- Professional growth: tuition reimbursement, ESPP, casual innovative culture.
- Core values: curiosity, ownership, empowerment, client focus, collaboration.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →