Staff Software Engineer (Machine Learning Inference Platform)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Staff Software Engineer (Machine Learning Inference Platform): Define and drive architecture for a high-throughput, low-latency, multi-tenant ML inference platform with an accent on serving, control plane, observability, capacity/tenant isolation, and model-engine integration. Focus on building robust distributed inference orchestration and developer-facing APIs/SDKs while optimizing GPU inference performance and system economics.
Location: Pittsburgh, PA or Remote
Company
Stack develops AI and autonomous systems for safety, reliability, and efficiency in modern operations, including trucking transportation.
What you will do
- Design platform architecture for multi-tenant inference workloads across serving, orchestration, control plane, APIs, SDKs, observability, and model-engine integration.
- Develop API layers (gRPC, WebSockets, REST) and developer SDKs that provide reliable token streams and abstract distributed orchestration complexity.
- Build and harden a multi-tenant control plane for metering, rate limiting, quotas, tenant isolation, and noisy-neighbor fairness.
- Optimize end-to-end inference performance, including the model engine layer, and tune throughput/latency trade-offs.
- Implement observability and SLOs covering system economics, cache-hit rates, GPU utilization, and cost accounting per model and tenant.
- Partner with product and infrastructure teams on model onboarding, capacity planning, external API contracts, and customer adoption.
Requirements
- 7+ years of experience building and operating backend distributed systems end to end.
- Demonstrated cross-team technical leadership in backend distributed systems, ML infrastructure, inference serving, or high-performance compute platforms.
- Strong fundamentals in data-intensive distributed systems, concurrency, networking, and performance profiling.
- Hands-on experience running large-scale GPU inference services, including KV caches and prefill/decode stages.
- Experience with inference engines/serving frameworks such as TensorRT, vLLM, Dynamo, Triton, or equivalent.
- Strong programming skills in C++, Go, Rust, or Python; familiarity with deep learning frameworks (e.g., PyTorch) and GPU primitives (CUDA, NCCL, NVLink).
Nice to have
- AV (autonomous vehicles) experience.
Culture & Benefits
- Equal opportunity workplace focused on inclusion, entrepreneurship, and innovation.
- High bar for engineering excellence, including setting engineering excellence culture within the team.
Hiring process
- Application may be contingent on verifying residence, U.S. person status, and/or citizenship status due to U.S. national security and export control requirements.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →