Эта вакансия в архиве
Посмотреть похожие вакансии ↓обновлено 2 месяца назад
Engineering Manager, Accelerator Platform
405 000 - 485 000$
Описание вакансии
Текст:
TL;DR
Engineering Manager, Accelerator Platform: Build and lead a team responsible for the bring-up and normalization of new hardware platforms for 's inference fleet, focusing on bridging the gap between low-level systems and serving infrastructure. Focus on hardware enablement, distributed systems, and ML infrastructure to ensure new accelerator generations ship as a first-class production platform.
Location: Hybrid (San Francisco, CA | New York City, NY | Seattle, WA). Expect all staff to be in one of the offices at least 25% of the time.
Salary: $405,000 - $485,000 USD
Company
’s mission is to create reliable, interpretable, and steerable AI systems.
What you will do
- Build and lead the Accelerator Platform team -- hiring, developing, and retaining engineers.
- Own the end-to-end bring-up lifecycle for new accelerator platforms (multiple generations of Trainium, TPUs, and GPUs), from initial silicon availability through production-ready inference.
- Define and drive the platform normalization layer -- ensuring new hardware integrates cleanly with 's inference serving stack to provide a consistent abstraction.
- Partner with cloud providers (AWS, GCP, Microsoft Azure) and chip vendors on hardware roadmaps, capacity planning, and platform-specific technical challenges.
- Collaborate closely with teams across Inference and Infrastructure to ensure new platforms meet production reliability and latency requirements from day one.
Requirements
- Have significant experience managing infrastructure or platform engineering teams (3+ years in engineering management).
- Have deep technical fluency in systems programming, distributed systems, or hardware/software co-design.
- Have experience bringing up or operating heterogeneous compute infrastructure at scale.
- Are comfortable with ambiguity and can build structure where none exists.
- Build strong cross-functional relationships.
Nice to have
- Have direct experience with ML accelerator architectures (GPU/CUDA, TPU/XLA, Trainium/Neuron, or similar).
- Have worked on ML inference serving infrastructure at scale (1000+ accelerators).
- Have experience with Kubernetes-based ML workload orchestration.
- Understand ML-specific networking (RDMA, InfiniBand, NVLink, ICI) and how interconnect topology affects serving performance.
- Have experience managing vendor relationships and influencing hardware/software roadmaps.
Culture & Benefits
- Competitive compensation and benefits.
- Optional equity donation matching.
- Generous vacation and parental leave.
- Flexible working hours.
- A lovely office space in which to collaborate with colleagues.
Hiring process
- We encourage you to apply even if you do not believe you meet every single qualification.