обновлено 21 час назад

Member of Technical Staff, Pre-Training Infrastructure (AI)

139 900 - 331 200$

Формат работы

hybrid

Тип работы

fulltime

Грейд

senior/principal

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Member of Technical Staff, Pre-Training Infrastructure (AI): Building and optimizing training infrastructure for frontier-scale AI models, advancing research toward humanist superintelligence with an accent on GPU clusters, high-throughput storage systems, and distributed training parallelism. Focus on scaling the latest research recipes, implementing new forms of distributed training, and ensuring the reliability and performance of thousands of GPUs across a supercomputing fleet.

Location: Hybrid in Mountain View, United States. Expected to work from a designated Microsoft office at least four days a week if living within 50 miles.

Salary: USD $139,900 – $331,200 per year

Company

Microsoft AI is dedicated to advancing Copilot and other consumer AI products and research, including Bing, Edge, and generative AI research.

What you will do

Design, implement, test, and optimize distributed training infrastructure in Python and C++ for large-scale GPU clusters.
Profile, benchmark, and debug performance bottlenecks across compute, memory, networking, and storage subsystems.
Optimize collective communication libraries (e.g., NCCL) for emerging NVLink and InfiniBand topologies.
Collaborate with hardware teams to optimize for next-generation accelerators.
Actively contribute to the development of AI models powering innovative products.
Drive architectural changes and influence the roadmap for relevant software and hardware components.

Requirements

Bachelor’s Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
Experience in distributed computing and large-scale systems.
Experience with GPU programming (CUDA, NCCL) and frameworks such as PyTorch.
Proven ability to profile, benchmark, and optimize performance-critical systems.
Experience building infrastructure for large-scale machine learning or generative AI workloads.
Experience in networking (InfiniBand, NVLink), storage systems, or distributed training parallelisms.
English: B2 required.

Culture & Benefits

Abundance of positive energy, empathy, and kindness.
Growth mindset, innovation, and collaboration.
Values of respect, integrity, and accountability to create a culture of inclusion.
Eligibility for benefits and other compensation (detailed information available on careers.microsoft.com).

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Похожие вакансии

Member of Technical Staff, Pre-Training Infrastructure (AI)

Microsoft AI

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Culture & Benefits

Похожие вакансии

Member Of Technical Staff (AI)

Member Of Technical Staff, Software Co-Design AI HPC Systems (AI Engineering)

Member Of Technical Staff, Reinforcement Learning Systems (AI)

Member Of Technical Staff, LLM Inference (AI Engineering)

Staff Software Engineer (Deep Learning Acceleration)

Staff Software Engineer (Machine Learning)

Разработка

Game Dev

Design и Creative

Аналитика

Менеджмент

People & Business

Member of Technical Staff, Pre-Training Infrastructure (AI)

Microsoft AI

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Culture & Benefits

Categories

Похожие вакансии

Member Of Technical Staff (AI)

Member Of Technical Staff, Software Co-Design AI HPC Systems (AI Engineering)

Member Of Technical Staff, Reinforcement Learning Systems (AI)

Member Of Technical Staff, LLM Inference (AI Engineering)

Staff Software Engineer (Deep Learning Acceleration)

Staff Software Engineer (Machine Learning)