Назад
2 дня назад

Staff ML Engineer (Generative Model Performance & Efficiency)

251 000 - 310 000$
Формат работы
onsite
Тип работы
fulltime
Грейд
lead
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Staff ML Engineer (Generative Model Performance & Efficiency): Analyze and optimize generative model training and inference for low-latency, high-throughput serving with an accent on performance bottlenecks, model compression, and scalable distributed execution. Focus on designing efficient serving and training pipelines, experimenting with partitioning/sharding strategies, and building tooling for profiling and debugging of ML workloads.

Company

Waymo builds autonomous driving technology and the Waymo Driver for fully autonomous ride-hail.

What you will do

  • Analyze model architectures to identify bottlenecks in training and inference performance (memory bandwidth, compute, communication).
  • Develop and apply techniques for efficiency, including quantization (FP8/INT4), pruning, knowledge distillation, and efficient attention mechanisms.
  • Optimize model code for hardware accelerators (TPUs/GPUs) using compiler features and low-level libraries such as XLA.
  • Experiment with model partitioning and sharding strategies (data, tensor, pipeline parallelism, expert parallelism) to improve scalability and efficiency.
  • Design and implement low-latency, high-throughput serving solutions for generative models and optimize training pipelines to reduce training time.
  • Build and maintain tools for performance analysis, profiling, and debugging (e.g., xprof).

Requirements

  • MS or PhD in Computer Science, Machine Learning, Robotics, or a related field.
  • 5+ years of experience with deep learning architectures (Transformers, Diffusion Models, MoEs) and optimization techniques.
  • Proficiency in JAX and Flax; experience with TensorFlow/PyTorch is a plus.
  • Expertise using profiling tools (XProf, Perfetto, NVIDIA Nsight) to diagnose performance issues in ML workloads.
  • Hands-on experience with quantization, pruning, distillation, and other model compression methods.
  • Strong programming skills in Python and potentially C++, with software development best practices.

Culture & Benefits

  • On-site role in Mountain View, California.
  • Discretionary annual bonus program and equity incentive plan (subject to eligibility).
  • Generous company benefits program (subject to eligibility).

Hiring process

  • Recruiter shares the specific salary range for the role location (or preferred location if remote is possible) during the hiring process.

Location: On Site — Mountain View, California

Salary: $251,000—$310,000 USD (base salary range across US locations)

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →