Senior Machine Learning Engineer, Runtime and Serving (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior Machine Learning Engineer (AI/Runtime): Designing and developing high‑performance ML runtime and serving systems for onboard autonomous vehicles and offboard data centers with an accent on JAX-native architecture and hardware-aware compute optimizations. Focus on migrating ML workloads to OpenXLA/PjRT, optimizing for GPUs/TPUs, and building robust profiling tools to eliminate system-level bottlenecks.
Location: Onsite/Hybrid in Mountain View, California
Salary: $213,000—$263,000 USD
Company
is an autonomous driving technology company building the Driver to improve mobility and safety through fully autonomous ride-hail services.
What you will do
- Architect and develop an efficient, high-performance ML runtime and serving system for both onboard autonomous vehicle compute and offboard data center environments.
- Lead the integration and feature development for ML inference runtimes, balancing real-time latency and memory constraints with high-throughput demands.
- Drive the strategic migration of ML workloads toward a JAX-native runtime architecture, extending underlying ML compilers like OpenXLA/PjRT and TensorRT.
- Collaborate with perception, planner, and research teams to analyze system-level ML workloads and apply hardware-aware compute optimizations.
- Design and build robust tooling for profiling and benchmarking to identify system-level bottlenecks across the end-to-end ML software stack.
Requirements
- B.S. or M.S. in CS, EE, Deep Learning or a related field.
- 5+ years of professional software engineering experience focused on building, scaling, or maintaining ML systems and infrastructure.
- 5+ years of production programming in C++.
- 3+ years of production experience in Python and major deep learning frameworks (e.g., PyTorch, JAX).
- Experience optimizing ML software for hardware accelerators such as GPUs, TPUs, or custom silicon.
- Experience building low-latency, highly concurrent distributed backend systems.
Nice to have
- PhD in CS, EE, Deep Learning or a related field.
- Experience modifying ML compilers, runtimes, or inference engines (e.g., TensorRT, ONNX Runtime, OpenXLA/PjRT, TVM).
- Experience building or scaling LLM serving systems, including distributed inference and performance optimization.
- Experience with custom kernel development using CUDA, Triton, or JAX/Pallas.
- Experience architecting unified serving APIs and optimizing tensor buffer management for multi-model inference pipelines.
Culture & Benefits
- Discretionary annual bonus program.
- Equity incentive plan.
- Generous Company benefits program.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →