Senior Software Engineer (AI Middleware)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior Software Engineer (AI Middleware): Design, develop, and optimize AI communication middleware for high-performance networking in AI/HPC datacenters with an accent on enabling collective communication libraries like NCCL/RCCL over custom interconnects. Focus on profiling distributed AI workloads, tuning frameworks such as PyTorch Distributed and DeepSpeed, and contributing upstream to open-source projects.
Location: Remote for employees residing within the United States.
Company
Delivering high-performance scale-out networking solutions for AI and HPC datacenters, integrating hardware, software, and system technologies for GPU/CPU clusters.
What you will do
- Design and implement performance-critical features for CCL enablement on ’ fabrics.
- Optimize distributed training across multi-node, multi-GPU setups, including GPU-direct transfers and synchronization.
- Profile AI workloads to identify bottlenecks in software/hardware stacks.
- Tune AI frameworks like PyTorch Distributed, TensorFlow/XLA, JAX, DeepSpeed, and Megatron-LM.
- Develop benchmarks aligned with real model performance and contribute upstream to AI projects.
- Collaborate with kernel/driver, switch, performance, and systems teams on design reviews and escalations.
Requirements
- Reside within the United States for remote position.
- 8+ years in high-performance systems programming in C/C++ on Linux.
- Strong experience with GPU communication stacks including CUDA/ROCm and NCCL/RCCL.
- Ability to optimize distributed training using profiling and tracing.
- Understanding of collective communication and topology awareness.
- Experience delivering production-quality code and open-source contributions.
Nice to have
- Experience with AI frameworks like PyTorch Distributed, DeepSpeed, Megatron-LM.
- Familiarity with libfabric/OFI, UCX, RDMA, RoCEv2, and Ultra Ethernet.
- Building cluster-scale performance test infrastructure.
Culture & Benefits
- Competitive compensation with equity, cash incentives, medical/dental/vision, disability/life insurance.
- 401(k) with company match, Open Time Off (OTO), sick time, bonding/pregnancy leave.
- Flexible work environment with onsite, hybrid, and fully remote roles in a global team.
- Opportunity to collaborate with leaders in the semiconductor industry.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →