Member of Technical Staff, Pre-Training Infrastructure (AI)

139 900 - 331 200$

Формат работы

hybrid

Тип работы

fulltime

Грейд

senior

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Member of Technical Staff, Pre-Training Infrastructure (AI): Building and optimizing distributed training infrastructure for frontier-scale AI models with an accent on massive GPU clusters and high-throughput storage systems. Focus on scaling research recipes, implementing distributed training parallelism, and ensuring reliability and performance of supercomputing fleets.

Location: Redmond, United States. Employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles (U.S.).

Salary: USD $139,900 – $331,200 per year

Company

hirify.global is dedicated to advancing Copilot and other consumer AI products and research.

What you will do

Design, implement, test, and optimize distributed training infrastructure in Python and C++ for large-scale GPU clusters.
Profile, benchmark, and debug performance bottlenecks across compute, memory, networking, and storage subsystems.
Optimize collective communication libraries (e.g., NCCL) for emerging NVLink and InfiniBand topologies.
Collaborate with hardware teams to optimize for next-generation accelerators.
Gather data and insights to develop the pretraining compute roadmap.
Actively contribute to the development of AI models powering innovative products.

Requirements

Bachelor’s Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
Experience in distributed computing and large-scale systems.
Experience with GPU programming (CUDA, NCCL) and frameworks such as PyTorch.
Proven ability to profile, benchmark, and optimize performance-critical systems.
Experience building infrastructure for large-scale machine learning or generative AI workloads.

Nice to have

Master’s Degree in Computer Science or related technical field AND 8+ years experience OR Bachelor’s Degree AND 12+ years experience.
Experience in leading technical projects and supporting architectural decisions with data.
Experience in networking (InfiniBand, NVLink), storage systems, or distributed training parallelisms.
Track record of contributing to high-performance computing or large-scale AI infrastructure projects.

Culture & Benefits

Be part of the team shaping the future of personal computing with Copilot, Bing, Edge, and generative AI research.
Work with a growth mindset, innovate to empower others, and collaborate to realize shared goals.
Build on values of respect, integrity, and accountability to create a culture of inclusion.

Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →