Назад
Company hidden
18 часов назад

Senior Machine Learning Engineer, Runtime and Serving (AI)

213 000 - 263 000$
Формат работы
onsite/hybrid
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior Machine Learning Engineer (AI/Runtime): Designing and developing high‑performance ML runtime and serving systems for onboard autonomous vehicles and offboard data centers with an accent on JAX-native architecture and hardware-aware compute optimizations. Focus on migrating ML workloads to OpenXLA/PjRT, optimizing for GPUs/TPUs, and building robust profiling tools to eliminate system-level bottlenecks.

Location: Onsite/Hybrid in Mountain View, California

Salary: $213,000—$263,000 USD

Company

hirify.global is an autonomous driving technology company building the hirify.global Driver to improve mobility and safety through fully autonomous ride-hail services.

What you will do

  • Architect and develop an efficient, high-performance ML runtime and serving system for both onboard autonomous vehicle compute and offboard data center environments.
  • Lead the integration and feature development for ML inference runtimes, balancing real-time latency and memory constraints with high-throughput demands.
  • Drive the strategic migration of ML workloads toward a JAX-native runtime architecture, extending underlying ML compilers like OpenXLA/PjRT and TensorRT.
  • Collaborate with perception, planner, and research teams to analyze system-level ML workloads and apply hardware-aware compute optimizations.
  • Design and build robust tooling for profiling and benchmarking to identify system-level bottlenecks across the end-to-end ML software stack.

Requirements

  • B.S. or M.S. in CS, EE, Deep Learning or a related field.
  • 5+ years of professional software engineering experience focused on building, scaling, or maintaining ML systems and infrastructure.
  • 5+ years of production programming in C++.
  • 3+ years of production experience in Python and major deep learning frameworks (e.g., PyTorch, JAX).
  • Experience optimizing ML software for hardware accelerators such as GPUs, TPUs, or custom silicon.
  • Experience building low-latency, highly concurrent distributed backend systems.

Nice to have

  • PhD in CS, EE, Deep Learning or a related field.
  • Experience modifying ML compilers, runtimes, or inference engines (e.g., TensorRT, ONNX Runtime, OpenXLA/PjRT, TVM).
  • Experience building or scaling LLM serving systems, including distributed inference and performance optimization.
  • Experience with custom kernel development using CUDA, Triton, or JAX/Pallas.
  • Experience architecting unified serving APIs and optimizing tensor buffer management for multi-model inference pipelines.

Culture & Benefits

  • Discretionary annual bonus program.
  • Equity incentive plan.
  • Generous Company benefits program.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →