Member Of Technical Staff (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Member of Technical Staff (AI): Developing and optimizing large-scale distributed training infrastructure for AI models with an accent on compute efficiency, GPU performance, and scalability. Focus on profiling bottlenecks, designing high-performance software for LLM training, and ensuring reliability across massive supercomputing fleets.
Location: Must be based in the San Francisco area and in-office 4 days a week in Mountain View, CA.
Salary: $158,400 – $304,200 per year (depending on level and specific location).
Company
A global technology leader driving innovation in artificial intelligence and superintelligence, committed to building safe and controllable AI systems for billions of users.
What you will do
- Design, implement, and optimize distributed training infrastructure in Python and C++ for massive GPU clusters.
- Develop telemetry systems to monitor ML model performance, utilization, and cost-related metrics.
- Profile and debug performance bottlenecks across compute, memory, and storage subsystems.
- Optimize collective communication libraries like NCCL for advanced network topologies.
- Collaborate with researchers and hardware teams to prepare for next-generation accelerators like NVIDIA and MAIA.
- Drive architectural improvements that deliver measurable efficiency gains across ML services.
Requirements
- Bachelor’s Degree in Computer Science or related technical discipline.
- Minimum 6+ years of technical engineering experience coding in languages such as C, C++, C#, Java, JavaScript, or Python.
- Must be local to the San Francisco area and willing to work in-office 4 days per week in Mountain View.
- Deep understanding of GPU architectures and modern LLM architectures.
- Proven experience in profiling and analyzing performance in large-scale distributed computing and ML systems.
- Experience with low-level GPU programming using tools such as CUDA, Triton, or NCCL.
Nice to have
- Master’s or higher degree in Computer Science or related fields.
- 10+ years of technical engineering experience for senior-level roles.
- Experience with high-performance frameworks like PyTorch or JAX.
- Knowledge of InfiniBand networking and storage system architecture.
Culture & Benefits
- Collaborative, growth-oriented culture focused on integrity, inclusion, and accountability.
- Opportunity to work on frontier-scale AI infrastructure with a startup-like team inside a global leader.
- Access to comprehensive corporate health and financial benefits.
- Strong focus on safety-aligned AI and creating positive global impact.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →