Назад
Company hidden
4 часа назад

Staff Software Engineer (Machine Learning Inference Platform)

Формат работы
remote
Тип работы
fulltime
Грейд
lead
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Staff Software Engineer (Machine Learning Inference Platform): Define and drive architecture for a high-throughput, low-latency, multi-tenant ML inference platform with an accent on serving, control plane, observability, capacity/tenant isolation, and model-engine integration. Focus on building robust distributed inference orchestration and developer-facing APIs/SDKs while optimizing GPU inference performance and system economics.

Location: Pittsburgh, PA or Remote

Company

Stack develops AI and autonomous systems for safety, reliability, and efficiency in modern operations, including trucking transportation.

What you will do

  • Design platform architecture for multi-tenant inference workloads across serving, orchestration, control plane, APIs, SDKs, observability, and model-engine integration.
  • Develop API layers (gRPC, WebSockets, REST) and developer SDKs that provide reliable token streams and abstract distributed orchestration complexity.
  • Build and harden a multi-tenant control plane for metering, rate limiting, quotas, tenant isolation, and noisy-neighbor fairness.
  • Optimize end-to-end inference performance, including the model engine layer, and tune throughput/latency trade-offs.
  • Implement observability and SLOs covering system economics, cache-hit rates, GPU utilization, and cost accounting per model and tenant.
  • Partner with product and infrastructure teams on model onboarding, capacity planning, external API contracts, and customer adoption.

Requirements

  • 7+ years of experience building and operating backend distributed systems end to end.
  • Demonstrated cross-team technical leadership in backend distributed systems, ML infrastructure, inference serving, or high-performance compute platforms.
  • Strong fundamentals in data-intensive distributed systems, concurrency, networking, and performance profiling.
  • Hands-on experience running large-scale GPU inference services, including KV caches and prefill/decode stages.
  • Experience with inference engines/serving frameworks such as TensorRT, vLLM, Dynamo, Triton, or equivalent.
  • Strong programming skills in C++, Go, Rust, or Python; familiarity with deep learning frameworks (e.g., PyTorch) and GPU primitives (CUDA, NCCL, NVLink).

Nice to have

  • AV (autonomous vehicles) experience.

Culture & Benefits

  • Equal opportunity workplace focused on inclusion, entrepreneurship, and innovation.
  • High bar for engineering excellence, including setting engineering excellence culture within the team.

Hiring process

  • Application may be contingent on verifying residence, U.S. person status, and/or citizenship status due to U.S. national security and export control requirements.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →