TL;DR
Research Software Engineer (AI): Building and optimizing core infrastructure for frontier AI models with an accent on distributed training systems and massive-scale data pipelines. Focus on translating research prototypes into reliable, scalable, and numerically stable training systems for large-scale GPU clusters.
Location: Must be based in New York, London, or San Francisco (On-site)
Company
hirify.global is an AI startup dedicated to building open superintelligence and accessible foundational models.
What you will do
- Design and optimize large-scale reinforcement learning training loops and data pipelines.
- Implement state-of-the-art training techniques ensuring numerical stability and computational efficiency.
- Build internal tooling to launch, monitor, and reproduce complex experiments.
- Diagnose performance bottlenecks across the training stack, including GPU memory and communication overhead.
- Translate research prototypes from paper to reusable, production-grade infrastructure.
Requirements
- Strong software engineering background with ability to implement complex research papers.
- Deep experience in either Distributed Training/Inference or Large-scale Data Infrastructure.
- Expertise at the intersection of machine learning algorithms, distributed systems, and high-performance computing.
- Proficiency with PyTorch or JAX and training stacks like Megatron.
- Experience with orchestration tools such as Ray, Kubernetes, or Slurm.
Culture & Benefits
- Top-tier compensation package including salary and equity.
- Comprehensive health, dental, and vision insurance coverage.
- Fully paid parental leave and family planning financial support.
- Daily provided lunch and dinner in the office.
- Regular team off-sites and celebrations.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →