Senior Machine Learning Engineer, Runtime and Serving (AI)

213 000 - 263 000$

Формат работы

onsite/hybrid

Тип работы

fulltime

Грейд

senior

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Senior Machine Learning Engineer (AI/Runtime): Designing and developing high‑performance ML runtime and serving systems for onboard autonomous vehicles and offboard data centers with an accent on JAX-native architecture and hardware-aware compute optimizations. Focus on migrating ML workloads to OpenXLA/PjRT, optimizing for GPUs/TPUs, and building robust profiling tools to eliminate system-level bottlenecks.

Location: Onsite/Hybrid in Mountain View, California

Salary: $213,000—$263,000 USD

Company

hirify.global is an autonomous driving technology company building the hirify.global Driver to improve mobility and safety through fully autonomous ride-hail services.

What you will do

Architect and develop an efficient, high-performance ML runtime and serving system for both onboard autonomous vehicle compute and offboard data center environments.
Lead the integration and feature development for ML inference runtimes, balancing real-time latency and memory constraints with high-throughput demands.
Drive the strategic migration of ML workloads toward a JAX-native runtime architecture, extending underlying ML compilers like OpenXLA/PjRT and TensorRT.
Collaborate with perception, planner, and research teams to analyze system-level ML workloads and apply hardware-aware compute optimizations.
Design and build robust tooling for profiling and benchmarking to identify system-level bottlenecks across the end-to-end ML software stack.

Requirements

B.S. or M.S. in CS, EE, Deep Learning or a related field.
5+ years of professional software engineering experience focused on building, scaling, or maintaining ML systems and infrastructure.
5+ years of production programming in C++.
3+ years of production experience in Python and major deep learning frameworks (e.g., PyTorch, JAX).
Experience optimizing ML software for hardware accelerators such as GPUs, TPUs, or custom silicon.
Experience building low-latency, highly concurrent distributed backend systems.

Nice to have

PhD in CS, EE, Deep Learning or a related field.
Experience modifying ML compilers, runtimes, or inference engines (e.g., TensorRT, ONNX Runtime, OpenXLA/PjRT, TVM).
Experience building or scaling LLM serving systems, including distributed inference and performance optimization.
Experience with custom kernel development using CUDA, Triton, or JAX/Pallas.
Experience architecting unified serving APIs and optimizing tensor buffer management for multi-model inference pipelines.

Culture & Benefits

Discretionary annual bonus program.
Equity incentive plan.
Generous Company benefits program.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →