Назад
Company hidden
18 часов назад

Member of Technical Staff, Pre-Training Infrastructure (AI)

139 900 - 331 200$
Формат работы
hybrid
Тип работы
fulltime
Грейд
senior/principal
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Member of Technical Staff, Pre-Training Infrastructure (AI): Building and optimizing training infrastructure for frontier-scale AI models, advancing research toward humanist superintelligence with an accent on GPU clusters, high-throughput storage systems, and distributed training parallelism. Focus on scaling the latest research recipes, implementing new forms of distributed training, and ensuring the reliability and performance of thousands of GPUs across a supercomputing fleet.

Location: Hybrid in Mountain View, United States. Expected to work from a designated Microsoft office at least four days a week if living within 50 miles.

Salary: USD $139,900 – $331,200 per year

Company

hirify.global is dedicated to advancing Copilot and other consumer AI products and research, including Bing, Edge, and generative AI research.

What you will do

  • Design, implement, test, and optimize distributed training infrastructure in Python and C++ for large-scale GPU clusters.
  • Profile, benchmark, and debug performance bottlenecks across compute, memory, networking, and storage subsystems.
  • Optimize collective communication libraries (e.g., NCCL) for emerging NVLink and InfiniBand topologies.
  • Collaborate with hardware teams to optimize for next-generation accelerators.
  • Actively contribute to the development of AI models powering innovative products.
  • Drive architectural changes and influence the roadmap for relevant software and hardware components.

Requirements

  • Bachelor’s Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Experience in distributed computing and large-scale systems.
  • Experience with GPU programming (CUDA, NCCL) and frameworks such as PyTorch.
  • Proven ability to profile, benchmark, and optimize performance-critical systems.
  • Experience building infrastructure for large-scale machine learning or generative AI workloads.
  • Experience in networking (InfiniBand, NVLink), storage systems, or distributed training parallelisms.
  • English: B2 required.

Culture & Benefits

  • Abundance of positive energy, empathy, and kindness.
  • Growth mindset, innovation, and collaboration.
  • Values of respect, integrity, and accountability to create a culture of inclusion.
  • Eligibility for benefits and other compensation (detailed information available on careers.microsoft.com).

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник - загрузка...