Member Of Technical Staff, AI Training Infrastructure (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Member of Technical Staff, AI Training Infrastructure (AI): Designing and optimizing large-scale AI training systems for LLMs and multimodal models with an accent on distributed training performance and data pipeline scalability. Focus on architecting robust infrastructure, solving high-performance computing bottlenecks, and automating orchestration for model development.
Location: San Mateo, CA
Salary: $175,000–$220,000 USD
Company
is a high-growth startup building next-generation generative AI infrastructure and industry-leading LLM inference platforms.
What you will do
- Design and implement scalable infrastructure tailored for large-scale model training workloads.
- Develop and maintain distributed training pipelines for LLMs and multimodal architectures.
- Optimize training performance across multi-GPU and multi-node clusters.
- Architect reliable data storage solutions for massive training datasets.
- Automate infrastructure provisioning, orchestration, and scaling.
- Collaborate with AI researchers to implement and troubleshoot complex distributed training methodologies.
Requirements
- Bachelor's degree in Computer Science or equivalent practical experience.
- 3+ years of professional experience in distributed systems and ML infrastructure.
- Proficiency with PyTorch and containerization technologies like Docker and Kubernetes.
- Strong background in cloud platforms such as AWS, GCP, or Azure.
- Deep understanding of distributed training techniques like FSDP, data parallelism, and model parallelism.
Nice to have
- Master's or PhD in Computer Science.
- Experience training large language models or complex multimodal systems.
- Background in optimizing high-performance distributed computing systems.
- Familiarity with ML DevOps practices and workflow orchestration tools.
- Proven contributions to open-source ML infrastructure.
Culture & Benefits
- Meaningful equity participation in a well-funded, fast-growing startup.
- Opportunity to solve high-complexity AI infrastructure challenges with bleeding-edge technology.
- Collaborative, flat-structure environment with minimal bureaucracy.
- Work directly with world-class engineers from Meta PyTorch and Google Vertex AI backgrounds.
- Competitive salary and comprehensive benefits package.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →