Senior Engineering Manager, AI Runtime (AI)
ΠΡΡΡ & Π‘ΠΎΠΏΡΠΎΠ²ΠΎΠ΄
ΠΠ»Ρ ΠΌΡΡΡΠ° Ρ ΡΡΠΎΠΉ Π²Π°ΠΊΠ°Π½ΡΠΈΠ΅ΠΉ Π½ΡΠΆΠ΅Π½ Plus
ΠΠΏΠΈΡΠ°Π½ΠΈΠ΅ Π²Π°ΠΊΠ°Π½ΡΠΈΠΈ
TL;DR
Senior Engineering Manager, AI Runtime (AI): Leading a team responsible for the Custom Training product and its foundational infrastructure with an accent on distributed training orchestration, cluster lifecycle, and training efficiency. Focus on architectural decisions and product design for managed GPU training at scale.
Location: Mountain View, California; San Francisco, California
Salary: $228,600 β $297,120 USD
Company
provides a data and AI infrastructure platform that unifies and democratizes data, analytics, and AI for organizations worldwide.
What you will do
- Lead, mentor, and grow a high-performing engineering team.
- Define and own the product and technical roadmap for AIR.
- Collaborate with product, research, platform, infrastructure teams, and customers to drive end-to-end delivery.
- Drive architectural decisions and product design for managed GPU training at scale.
- Build observability and reliability practices for long-running, multi-node training jobs.
- Partner with recruiting to attract, hire, and develop top-tier engineering talent.
Requirements
- 8+ years of software engineering experience, with 3+ years in engineering management.
- Track record building and operating managed GPU training infrastructure at scale (100s/1000s GPUs).
- Deep familiarity with distributed training frameworks (PyTorch, DeepSpeed, Composer, Megatron-LM) and parallelism strategies (FSDP, tensor/pipeline parallelism).
- Experience with training resilience patterns: checkpointing, elastic training, and automated failure recovery for long-running jobs.
- Understanding of GPU performance fundamentals including NCCL, interconnect topologies, and memory optimization.
- Experience building platform products with clear SLAs where you've owned the customer experience.
- Strong cross-functional leadership across platform, product, and research teams.
- BS/MS in Computer Science, Electrical Engineering, or related technical field.
Culture & Benefits
- Comprehensive benefits and perks to meet the needs of all employees.
- Committed to fostering a diverse and inclusive culture where everyone can excel.
- Hiring practices are inclusive and meet equal employment opportunity standards.
ΠΡΠ΄ΡΡΠ΅ ΠΎΡΡΠΎΡΠΎΠΆΠ½Ρ: Π΅ΡΠ»ΠΈ ΡΠ°Π±ΠΎΡΠΎΠ΄Π°ΡΠ΅Π»Ρ ΠΏΡΠΎΡΠΈΡ Π²ΠΎΠΉΡΠΈ Π² ΠΈΡ ΡΠΈΡΡΠ΅ΠΌΡ, ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΡ iCloud/Google, ΠΏΡΠΈΡΠ»Π°ΡΡ ΠΊΠΎΠ΄/ΠΏΠ°ΡΠΎΠ»Ρ, Π·Π°ΠΏΡΡΡΠΈΡΡ ΠΊΠΎΠ΄/ΠΠ, Π½Π΅ Π΄Π΅Π»Π°ΠΉΡΠ΅ ΡΡΠΎΠ³ΠΎ - ΡΡΠΎ ΠΌΠΎΡΠ΅Π½Π½ΠΈΠΊΠΈ. ΠΠ±ΡΠ·Π°ΡΠ΅Π»ΡΠ½ΠΎ ΠΆΠΌΠΈΡΠ΅ "ΠΠΎΠΆΠ°Π»ΠΎΠ²Π°ΡΡΡΡ" ΠΈΠ»ΠΈ ΠΏΠΈΡΠΈΡΠ΅ Π² ΠΏΠΎΠ΄Π΄Π΅ΡΠΆΠΊΡ. ΠΠΎΠ΄ΡΠΎΠ±Π½Π΅Π΅ Π² Π³Π°ΠΉΠ΄Π΅ β