Staff Machine Learning Engineer (ML Infrastructure)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Staff Machine Learning Engineer (ML Infrastructure): Designing and building scalable AI/ML platform infrastructure to support advanced AI research and intelligent driving technologies with an accent on distributed training and resource optimization. Focus on developing high-performance ML frameworks, maximizing GPU utilization, and ensuring system observability for large-scale model training.
Location: Remote (Must be based in the USA). Willingness to travel to Sunnyvale, CA as needed.
Salary: $185,000 – $335,300
Company
Global automotive leader driving the transition to zero crashes, zero emissions, and zero congestion through intelligent driving technologies.
What you will do
- Design and develop scalable, reliable, high-performance ML frameworks to support model training at scale.
- Optimize distributed training workflows to maximize resource utilization across heterogeneous hardware and reduce costs.
- Enhance system observability, debuggability, and operational excellence to improve user experience.
- Collaborate with research scientists and ML engineers to integrate new features and technologies into the platform.
Requirements
- Bachelor's degree or higher in Computer Science or equivalent relevant experience.
- 5+ years of professional software engineering experience.
- 3+ years of specialized experience in AI/ML infrastructure, specifically enabling distributed training for large models.
- Strong programming skills in Python and proficiency in PyTorch, TensorFlow, or similar frameworks.
- Experience with distributed computing, GPU computing, and cloud environments (AWS, GCP, Azure).
- Must be based in the USA.
Nice to have
- Extensive knowledge of PyTorch 2.x+ and distributed training frameworks.
- Experience with FSDP, Pipeline Parallelism, and scalable solutions for training large foundational models.
- Expertise in profiling, analysis, and debugging of training and data loading performance.
- Strong communication skills for resolving technical conflicts and driving consensus.
Culture & Benefits
- Comprehensive health and wellbeing programs including medical, dental, vision, HSA, and FSA.
- Retirement savings plan, sickness, and accident benefits.
- Paid vacation, holidays, and tuition assistance programs.
- Eligible for company vehicle evaluation program.
- Flexible remote work arrangement within the USA.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →