Staff ML Engineer (Generative Model Performance & Efficiency)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Staff ML Engineer (Generative Model Performance & Efficiency): Analyze and optimize generative model training and inference for low-latency, high-throughput serving with an accent on performance bottlenecks, model compression, and scalable distributed execution. Focus on designing efficient serving and training pipelines, experimenting with partitioning/sharding strategies, and building tooling for profiling and debugging of ML workloads.
Company
Waymo builds autonomous driving technology and the Waymo Driver for fully autonomous ride-hail.
What you will do
- Analyze model architectures to identify bottlenecks in training and inference performance (memory bandwidth, compute, communication).
- Develop and apply techniques for efficiency, including quantization (FP8/INT4), pruning, knowledge distillation, and efficient attention mechanisms.
- Optimize model code for hardware accelerators (TPUs/GPUs) using compiler features and low-level libraries such as XLA.
- Experiment with model partitioning and sharding strategies (data, tensor, pipeline parallelism, expert parallelism) to improve scalability and efficiency.
- Design and implement low-latency, high-throughput serving solutions for generative models and optimize training pipelines to reduce training time.
- Build and maintain tools for performance analysis, profiling, and debugging (e.g., xprof).
Requirements
- MS or PhD in Computer Science, Machine Learning, Robotics, or a related field.
- 5+ years of experience with deep learning architectures (Transformers, Diffusion Models, MoEs) and optimization techniques.
- Proficiency in JAX and Flax; experience with TensorFlow/PyTorch is a plus.
- Expertise using profiling tools (XProf, Perfetto, NVIDIA Nsight) to diagnose performance issues in ML workloads.
- Hands-on experience with quantization, pruning, distillation, and other model compression methods.
- Strong programming skills in Python and potentially C++, with software development best practices.
Culture & Benefits
- On-site role in Mountain View, California.
- Discretionary annual bonus program and equity incentive plan (subject to eligibility).
- Generous company benefits program (subject to eligibility).
Hiring process
- Recruiter shares the specific salary range for the role location (or preferred location if remote is possible) during the hiring process.
Location: On Site — Mountain View, California
Salary: $251,000—$310,000 USD (base salary range across US locations)
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →