Назад
Company hidden
14 часов назад

Staff Machine Learning Engineer (ML Infrastructure)

185 000 - 335 300$
Формат работы
remote (только USA)
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Релокация
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Staff Machine Learning Engineer (ML Infrastructure): Designing and building scalable AI/ML platform infrastructure to support advanced AI research and intelligent driving technologies with an accent on distributed training and resource optimization. Focus on developing high-performance ML frameworks, maximizing GPU utilization, and ensuring system observability for large-scale model training.

Location: Remote (Must be based in the USA). Willingness to travel to Sunnyvale, CA as needed.

Salary: $185,000 – $335,300

Company

Global automotive leader driving the transition to zero crashes, zero emissions, and zero congestion through intelligent driving technologies.

What you will do

  • Design and develop scalable, reliable, high-performance ML frameworks to support model training at scale.
  • Optimize distributed training workflows to maximize resource utilization across heterogeneous hardware and reduce costs.
  • Enhance system observability, debuggability, and operational excellence to improve user experience.
  • Collaborate with research scientists and ML engineers to integrate new features and technologies into the platform.

Requirements

  • Bachelor's degree or higher in Computer Science or equivalent relevant experience.
  • 5+ years of professional software engineering experience.
  • 3+ years of specialized experience in AI/ML infrastructure, specifically enabling distributed training for large models.
  • Strong programming skills in Python and proficiency in PyTorch, TensorFlow, or similar frameworks.
  • Experience with distributed computing, GPU computing, and cloud environments (AWS, GCP, Azure).
  • Must be based in the USA.

Nice to have

  • Extensive knowledge of PyTorch 2.x+ and distributed training frameworks.
  • Experience with FSDP, Pipeline Parallelism, and scalable solutions for training large foundational models.
  • Expertise in profiling, analysis, and debugging of training and data loading performance.
  • Strong communication skills for resolving technical conflicts and driving consensus.

Culture & Benefits

  • Comprehensive health and wellbeing programs including medical, dental, vision, HSA, and FSA.
  • Retirement savings plan, sickness, and accident benefits.
  • Paid vacation, holidays, and tuition assistance programs.
  • Eligible for company vehicle evaluation program.
  • Flexible remote work arrangement within the USA.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →