TL;DR
Member Of Technical Staff, Pre-training Infra GPU (AI): Building and optimizing AI models and their training infrastructure for large-scale GPU clusters with an accent on high-performance computing, distributed systems, and generative AI. Focus on designing, implementing, and optimizing GPU kernels, profiling performance bottlenecks, and contributing to the pretraining compute roadmap.
Location: Hybrid, New York, United States. MAI employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles (U.S.) of that location.
Salary: USD $188,000 – $304,200 per year (New York City metropolitan area, for IC5) and USD $220,800 – $331,200 per year (New York City metropolitan area, for IC6).
Company
hirify.global is an organization dedicated to advancing Copilot and other consumer AI products and research, including Bing, Edge, and generative AI research.
What you will do
- Design, implement, test, and optimize AI models in Python and CUDA C++ for large-scale GPU clusters.
- Profile, benchmark, and debug performance bottlenecks across compute, memory, and networking subsystems.
- Optimize collective communication libraries (e.g., NCCL) for emerging NVLink and InfiniBand topologies.
- Collaborate with hardware teams to optimize for next-generation accelerators.
- Gather data and insights to develop the pretraining compute roadmap.
Requirements
- Bachelor’s Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including C, C++, C#, Java, JavaScript, or Python.
- Experience with GPU programming (CUDA, NCCL) and frameworks such as PyTorch.
- Proven ability to profile, benchmark, and optimize performance-critical systems.
- Experience building infrastructure for large-scale machine learning or generative AI workloads.
- Strong background in distributed computing and large-scale systems.
Nice to have
- Master’s Degree in Computer Science or related technical field AND 8+ years technical engineering experience, or Bachelor’s Degree AND 12+ years.
- Experience in leading technical projects and supporting architectural decisions with data.
- Deep expertise in networking (InfiniBand, NVLink), storage systems, or distributed training parallelisms.
- Track record of contributing to high-performance computing or large-scale AI infrastructure projects.
Culture & Benefits
- Embody a growth mindset, innovate to empower others, and collaborate to realize shared goals.
- Work within values of respect, integrity, and accountability to create a culture of inclusion.
- Access to comprehensive benefits and other compensation as outlined on the Microsoft careers page.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →