Назад
Company hidden
6 дней назад

Member of Technical Staff, Pre-Training Infrastructure (AI)

139 900 - 331 200$
Формат работы
hybrid
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Member of Technical Staff, Pre-Training Infrastructure (AI): Building and optimizing distributed training infrastructure for frontier-scale AI models with an accent on massive GPU clusters and high-throughput storage systems. Focus on scaling research recipes, implementing distributed training parallelism, and ensuring reliability and performance of supercomputing fleets.

Location: Redmond, United States. Employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles (U.S.).

Salary: USD $139,900 – $331,200 per year

Company

hirify.global is dedicated to advancing Copilot and other consumer AI products and research.

What you will do

  • Design, implement, test, and optimize distributed training infrastructure in Python and C++ for large-scale GPU clusters.
  • Profile, benchmark, and debug performance bottlenecks across compute, memory, networking, and storage subsystems.
  • Optimize collective communication libraries (e.g., NCCL) for emerging NVLink and InfiniBand topologies.
  • Collaborate with hardware teams to optimize for next-generation accelerators.
  • Gather data and insights to develop the pretraining compute roadmap.
  • Actively contribute to the development of AI models powering innovative products.

Requirements

  • Bachelor’s Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
  • Experience in distributed computing and large-scale systems.
  • Experience with GPU programming (CUDA, NCCL) and frameworks such as PyTorch.
  • Proven ability to profile, benchmark, and optimize performance-critical systems.
  • Experience building infrastructure for large-scale machine learning or generative AI workloads.

Nice to have

  • Master’s Degree in Computer Science or related technical field AND 8+ years experience OR Bachelor’s Degree AND 12+ years experience.
  • Experience in leading technical projects and supporting architectural decisions with data.
  • Experience in networking (InfiniBand, NVLink), storage systems, or distributed training parallelisms.
  • Track record of contributing to high-performance computing or large-scale AI infrastructure projects.

Culture & Benefits

  • Be part of the team shaping the future of personal computing with Copilot, Bing, Edge, and generative AI research.
  • Work with a growth mindset, innovate to empower others, and collaborate to realize shared goals.
  • Build on values of respect, integrity, and accountability to create a culture of inclusion.

Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →