Назад
Company hidden
8 часов назад

Senior Software Engineer (Machine Learning Inference Platform)

Формат работы
remote (только USA)
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior Software Engineer (Machine Learning Inference Platform): Own meaningful subsystems of an inference platform and drive them from design through production with an accent on high-throughput, low-latency multi-tenant inference workloads. Focus on building serving APIs and SDKs, hardening a multi-tenant control plane (metering, rate limiting, quotas, tenant isolation), and optimizing performance and observability (SLOs, GPU utilization, cost accounting).

Location: Pittsburgh, PA or Remote

Company

Stack develops AI and autonomous systems for safety, reliability, and efficiency in modern operations, including trucking transportation.

What you will do

  • Design and deliver subsystems for a high-throughput, low-latency, multi-tenant inference platform for enterprise workloads.
  • Build robust API layers (gRPC, WebSockets, REST) and developer SDKs for reliable token streaming and distributed inference orchestration.
  • Develop and harden a multi-tenant control plane for metering, rate limiting, quotas, tenant isolation, and noisy-neighbor fairness.
  • Optimize inference performance across the system stack, including the model engine layer.
  • Implement observability and SLOs to track system economics, cache-hit rates, GPU utilization, and cost accounting per model and tenant.
  • Partner with product and infrastructure teams on model onboarding, capacity planning, external API contracts, and customer adoption.

Requirements

  • 4+ years of experience building and operating backend distributed systems end to end.
  • Strong fundamentals in data-intensive distributed systems, concurrency, networking, and performance profiling.
  • Hands-on experience with large-scale GPU inference services, including KV caches and prefill/decode throughput/latency trade-offs.
  • Direct experience with inference engines (TensorRT, vLLM, etc.) or serving frameworks (Dynamo, Triton, or equivalent).
  • Strong programming skills in C++, Go, Rust, or Python.
  • Experience with GPU computing primitives (CUDA, NCCL, NVLink) and low-latency networking (InfiniBand, RoCE, cluster communication).

Nice to have

  • Autonomous vehicles (AV) experience.

Culture & Benefits

  • Equal opportunity workplace focused on inclusion, entrepreneurship, and innovation.
  • Mentoring and code quality practices to raise the engineering bar.
  • Work spans design, production debugging, and continuous delivery of inference platform subsystems.

Hiring process

  • Interviews and technical evaluation focused on distributed systems, inference serving, and production problem-solving.
  • Assessment may include eligibility checks related to U.S. national security and export-control requirements.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →