Назад
Company hidden
5 дней назад

Member Of Technical Staff, AI Training Infrastructure (AI)

175 000 - 220 000$
Формат работы
onsite
Тип работы
fulltime
Грейд
middle/senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Member of Technical Staff, AI Training Infrastructure (AI): Designing and optimizing large-scale AI training systems for LLMs and multimodal models with an accent on distributed training performance and data pipeline scalability. Focus on architecting robust infrastructure, solving high-performance computing bottlenecks, and automating orchestration for model development.

Location: San Mateo, CA

Salary: $175,000–$220,000 USD

Company

hirify.global is a high-growth startup building next-generation generative AI infrastructure and industry-leading LLM inference platforms.

What you will do

  • Design and implement scalable infrastructure tailored for large-scale model training workloads.
  • Develop and maintain distributed training pipelines for LLMs and multimodal architectures.
  • Optimize training performance across multi-GPU and multi-node clusters.
  • Architect reliable data storage solutions for massive training datasets.
  • Automate infrastructure provisioning, orchestration, and scaling.
  • Collaborate with AI researchers to implement and troubleshoot complex distributed training methodologies.

Requirements

  • Bachelor's degree in Computer Science or equivalent practical experience.
  • 3+ years of professional experience in distributed systems and ML infrastructure.
  • Proficiency with PyTorch and containerization technologies like Docker and Kubernetes.
  • Strong background in cloud platforms such as AWS, GCP, or Azure.
  • Deep understanding of distributed training techniques like FSDP, data parallelism, and model parallelism.

Nice to have

  • Master's or PhD in Computer Science.
  • Experience training large language models or complex multimodal systems.
  • Background in optimizing high-performance distributed computing systems.
  • Familiarity with ML DevOps practices and workflow orchestration tools.
  • Proven contributions to open-source ML infrastructure.

Culture & Benefits

  • Meaningful equity participation in a well-funded, fast-growing startup.
  • Opportunity to solve high-complexity AI infrastructure challenges with bleeding-edge technology.
  • Collaborative, flat-structure environment with minimal bureaucracy.
  • Work directly with world-class engineers from Meta PyTorch and Google Vertex AI backgrounds.
  • Competitive salary and comprehensive benefits package.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →