Principal Engineer, Data & Compute (AI Infrastructure)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Principal Engineer, Data & Compute (AI Infrastructure): Designing and guiding the evolution of foundational compute and storage systems for end-to-end neural networks with an accent on GPU orchestration and petabyte-scale data federation. Focus on building cross-region GPU job execution and scaling infrastructure to exabyte-scale to accelerate AI research and model deployment.
Location: Must be based in Sunnyvale, CA (Hybrid)
Salary: $370,300 – $418,200 plus a competitive equity package
Company
is a leading developer of Embodied AI technology creating mapless, hardware-agnostic AI products for automated driving.
What you will do
- Define and evolve global compute strategy for orchestrating training and inference across thousands of GPUs and multiple data centers.
- Design petabyte-scale data federation systems for high-volume sensor and simulation data to prepare the company for exabyte-scale.
- Build the foundations for large-scale AI workloads to run seamlessly across hybrid and multi-cloud environments.
- Align compute investments and architecture with company strategy, growth plans, and performance goals.
- Uplift the engineering organization through architectural coaching, technical deep dives, and mentorship.
Requirements
- 10+ years of experience designing and building large-scale distributed systems.
- 4+ years of experience focused on GPU-based cloud infrastructure.
- Proven track record of enabling large-scale AI training, inference, or computer vision workloads in GPU clusters.
- Deep understanding of petabyte-scale data architecture, including storage federation and high-throughput access.
- Strong technical leadership skills with a history of defining architectural strategy and developing engineers.
- Advanced degree in Computer Science, Electrical Engineering, or equivalent industry experience.
- Must be based in Sunnyvale, CA (Hybrid).
Nice to have
- Experience with multi-cloud orchestration for latency- or cost-sensitive pipelines.
- Fluency in AI/ML job scheduling and tools such as Ray, Kubernetes, Airflow, or Flyte.
- Experience with infrastructure-as-code practices.
- Background in supporting safety-critical or real-time inference for robotics, autonomous vehicles, or aerospace.
Culture & Benefits
- Hybrid working policy combining office/workshop time with working from home.
- Inclusive work environment that values diversity and new perspectives.
- Competitive equity package.
- Opportunity to tackle unprecedented scale in data infrastructure and compute orchestration in a fast-paced environment.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →