Member Of Technical Staff, Pre-Training Infrastructure (AI)

139 900 - 274 800$

Формат работы

hybrid

Тип работы

fulltime

Грейд

middle

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Member of Technical Staff, Pre-Training Infrastructure (AI): Contributing to building a fast-moving codebase that enables training at unprecedented scale with an accent on building and optimizing the software stack for massive GPU clusters and high-throughput storage systems. Focus on profiling, benchmarking, debugging, and fine-grained optimization, demanding both engineering rigor and creativity.

Location: Must work from a designated hirify.global office at least four days a week if live within 50 miles (U.S.) or 25 miles (non-U.S.) of that location.

Salary: USD $139,900 – $274,800 per year.

Company

hirify.global’s mission is to empower every person and every organization on the planet to achieve more.

What you will do

Design, implement, test, and optimize distributed training infrastructure in Python and C++ for large-scale GPU clusters.
Profile, benchmark, and debug performance bottlenecks across compute, memory, networking, and storage subsystems.
Optimize collective communication libraries (e.g., NCCL) for emerging NVLink and InfiniBand topologies.
Collaborate with hardware teams to optimize for next-generation accelerators (NVIDIA, AMD, and beyond).
Gather data and insights to develop the pretraining compute roadmap.
Actively contribute to the development of AI models powering our innovative products.

Requirements

Bachelor’s Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
Experience in distributed computing and large-scale systems.
Experience with GPU programming (CUDA, NCCL) and frameworks such as PyTorch.
Proven ability to profile, benchmark, and optimize performance-critical systems.
Experience in leading technical projects and supporting architectural decisions with data.

Nice to have

Master’s Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor’s Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
Experience building infrastructure for large-scale machine learning or generative AI workloads.
Experience in networking (InfiniBand, NVLink), storage systems, or distributed training parallelisms.
Track record of contributing to high-performance computing or large-scale AI infrastructure projects.

Culture & Benefits

Come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals.
Build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник - загрузка...