ML Systems Engineer (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
ML Systems Engineer (AI/Systems): Building high-performance systems for frontier model training and inference with an accent on RL feedback loops and hardware optimization. Focus on designing low-latency RDMA weight synchronization, optimizing GPU kernels, and orchestrating large-scale clusters to accelerate scientific discovery.
Location: Preferred Menlo Park or San Francisco, but can be flexible based on role
Compensation: $300,000–$400,000
Company
An AI and physical sciences company building state-of-the-art models to accelerate breakthroughs across materials, energy, and beyond.
What you will do
- Build rack and topology-aware scheduling for GB series GPUs across Ray, Slurm, and Kubernetes.
- Develop online/offline profilers to eliminate bottlenecks and implement direct S3 checkpoint streaming.
- Write and optimize communication and GPU kernels to maximize hardware throughput.
- Design and implement zero-copy RDMA weight synchronization between training and inference.
- Contribute upstream to open-source communities including SGLang, Megatron, and Ray.
- Co-design algorithms and infrastructure in close collaboration with RL and pretraining researchers.
Requirements
- Experience with large-scale inference infrastructure (load balancing, scheduling, serving architecture).
- Proficiency in low-level systems programming including RDMA, NVLink, and kernel-level optimization.
- Expertise in GPU cluster orchestration across Ray, Slurm, or Kubernetes.
- Ability to write and optimize CUDA kernels and distributed training collective operations.
- Experience profiling and benchmarking distributed ML systems across compute, memory, and network.
- Visa sponsorship is available for qualified candidates.
Culture & Benefits
- Opportunity to work on frontier-scale RL using thousands of GPUs.
- Direct impact on the pace of scientific discovery in physical sciences.
- High-ownership environment operating at the pace of the AI frontier.
- Legal and administrative support for visa sponsorship.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →