Senior Software Engineer (Machine Learning Inference Platform)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior Software Engineer (Machine Learning Inference Platform): Own meaningful subsystems of an inference platform and drive them from design through production with an accent on high-throughput, low-latency multi-tenant inference workloads. Focus on building serving APIs and SDKs, hardening a multi-tenant control plane (metering, rate limiting, quotas, tenant isolation), and optimizing performance and observability (SLOs, GPU utilization, cost accounting).
Location: Pittsburgh, PA or Remote
Company
Stack develops AI and autonomous systems for safety, reliability, and efficiency in modern operations, including trucking transportation.
What you will do
- Design and deliver subsystems for a high-throughput, low-latency, multi-tenant inference platform for enterprise workloads.
- Build robust API layers (gRPC, WebSockets, REST) and developer SDKs for reliable token streaming and distributed inference orchestration.
- Develop and harden a multi-tenant control plane for metering, rate limiting, quotas, tenant isolation, and noisy-neighbor fairness.
- Optimize inference performance across the system stack, including the model engine layer.
- Implement observability and SLOs to track system economics, cache-hit rates, GPU utilization, and cost accounting per model and tenant.
- Partner with product and infrastructure teams on model onboarding, capacity planning, external API contracts, and customer adoption.
Requirements
- 4+ years of experience building and operating backend distributed systems end to end.
- Strong fundamentals in data-intensive distributed systems, concurrency, networking, and performance profiling.
- Hands-on experience with large-scale GPU inference services, including KV caches and prefill/decode throughput/latency trade-offs.
- Direct experience with inference engines (TensorRT, vLLM, etc.) or serving frameworks (Dynamo, Triton, or equivalent).
- Strong programming skills in C++, Go, Rust, or Python.
- Experience with GPU computing primitives (CUDA, NCCL, NVLink) and low-latency networking (InfiniBand, RoCE, cluster communication).
Nice to have
- Autonomous vehicles (AV) experience.
Culture & Benefits
- Equal opportunity workplace focused on inclusion, entrepreneurship, and innovation.
- Mentoring and code quality practices to raise the engineering bar.
- Work spans design, production debugging, and continuous delivery of inference platform subsystems.
Hiring process
- Interviews and technical evaluation focused on distributed systems, inference serving, and production problem-solving.
- Assessment may include eligibility checks related to U.S. national security and export-control requirements.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →