Эта вакансия в архиве
Посмотреть похожие вакансии ↓обновлено 22 дня назад
Principal Engineer, Data & Compute (AI Infrastructure)
Описание вакансии
Текст:
TL;DR
Principal Engineer, Data & Compute (AI Infrastructure): Designing and guiding the evolution of foundational compute and storage systems for end-to-end neural network training and inference at unprecedented scale, with an accent on global compute strategy, petabyte-scale data federation, and cross-region GPU job execution. Focus on designing highly performant, resilient, and cost-efficient architectures for AI model development lifecycle, enabling rapid model deployment and ensuring platform scalability.
Location: Hybrid in Sunnyvale, California, USA
Company
is a leading developer of Embodied AI technology, creating advanced AI software and foundation models for automated driving systems.
What you will do
- Define and evolve global compute architecture for thousands of GPUs across data centers, ensuring optimal throughput and cost efficiency.
- Design petabyte-scale data federation systems for fast, reliable access to high-volume sensor and simulation data across geographies.
- Build foundations to enable large-scale AI workloads to run seamlessly across hybrid and multi-cloud environments.
- Act as a trusted partner to leadership in aligning compute investments and architecture with company strategy.
- Provide technical leadership and mentorship, cultivating operational and engineering excellence across the engineering organization.
Requirements
- 10+ years designing and building large-scale distributed systems, with at least 4 years focused on GPU-based cloud infrastructure.
- Proven experience enabling large-scale AI training, inference, or computer vision workloads in GPU clusters.
- Deep understanding of petabyte-scale data architecture, including storage federation, high-throughput access, and data locality for AI workloads.
- Strong technical leadership with a track record of defining and communicating architectural strategy.
- A natural mentor with a history of developing engineers and influencing technical direction across teams.
- Advanced degree in Computer Science, Electrical Engineering, or a related field—or equivalent industry experience.
Nice to have
- Experience with multi-cloud orchestration, particularly in latency- or cost-sensitive training and inference pipelines.
- Familiarity with systems like Ray, Kubernetes, Airflow, or Flyte, and deep fluency in AI/ML job scheduling, model lifecycle management, and infrastructure-as-code practices.
- Background in supporting safety-critical or real-time inference use cases (e.g., robotics, autonomous vehicles, aerospace).
- Passion for building infrastructure-as-a-product that delivers performance and simplicity to research and product teams alike.
Culture & Benefits
- Operate a hybrid working policy combining time in offices/workshops with working from home.
- Committed to creating a diverse, fair, and respectful culture inclusive of everyone.
- Embrace uncertainty and complex challenges to unlock groundbreaking solutions.
- Value diversity, embrace new perspectives, and foster an inclusive work environment.
- Constantly learning and evolving in pursuit of excellence.
Похожие вакансии
2 дня назад
Software Engineer, AI Compute Infrastructure (AI)
2 дня назад
Tech Lead, AI Compute Infrastructure (AI)
1 день назад
Principal Engineer - Cloud (AI)
244 000 - 287 000$
2 дня назад
Principal Software Engineer, Onboard Field Response (AI)
349 000 - 431 000$
2 дня назад
Senior Software Engineer (ML Infrastructure)
3 дня назад
Forward Deployed Engineer (AI)
350 000 - 475 000$