Principal Engineer, Data & Compute (AI Infrastructure)

Формат работы

hybrid

Тип работы

fulltime

Грейд

principal

Английский

Страна

Описание вакансии

Текст:

TL;DR

Principal Engineer, Data & Compute (AI Infrastructure): Designing and guiding the evolution of foundational compute and storage systems for end-to-end neural network training and inference at unprecedented scale, with an accent on global compute strategy, petabyte-scale data federation, and cross-region GPU job execution. Focus on designing highly performant, resilient, and cost-efficient architectures for AI model development lifecycle, enabling rapid model deployment and ensuring platform scalability.

Location: Hybrid in Sunnyvale, California, USA

Company

hirify.global is a leading developer of Embodied AI technology, creating advanced AI software and foundation models for automated driving systems.

What you will do

Define and evolve global compute architecture for thousands of GPUs across data centers, ensuring optimal throughput and cost efficiency.
Design petabyte-scale data federation systems for fast, reliable access to high-volume sensor and simulation data across geographies.
Build foundations to enable large-scale AI workloads to run seamlessly across hybrid and multi-cloud environments.
Act as a trusted partner to leadership in aligning compute investments and architecture with company strategy.
Provide technical leadership and mentorship, cultivating operational and engineering excellence across the engineering organization.

Requirements

10+ years designing and building large-scale distributed systems, with at least 4 years focused on GPU-based cloud infrastructure.
Proven experience enabling large-scale AI training, inference, or computer vision workloads in GPU clusters.
Deep understanding of petabyte-scale data architecture, including storage federation, high-throughput access, and data locality for AI workloads.
Strong technical leadership with a track record of defining and communicating architectural strategy.
A natural mentor with a history of developing engineers and influencing technical direction across teams.
Advanced degree in Computer Science, Electrical Engineering, or a related field—or equivalent industry experience.

Nice to have

Experience with multi-cloud orchestration, particularly in latency- or cost-sensitive training and inference pipelines.
Familiarity with systems like Ray, Kubernetes, Airflow, or Flyte, and deep fluency in AI/ML job scheduling, model lifecycle management, and infrastructure-as-code practices.
Background in supporting safety-critical or real-time inference use cases (e.g., robotics, autonomous vehicles, aerospace).
Passion for building infrastructure-as-a-product that delivers performance and simplicity to research and product teams alike.

Culture & Benefits

Operate a hybrid working policy combining time in offices/workshops with working from home.
Committed to creating a diverse, fair, and respectful culture inclusive of everyone.
Embrace uncertainty and complex challenges to unlock groundbreaking solutions.
Value diversity, embrace new perspectives, and foster an inclusive work environment.
Constantly learning and evolving in pursuit of excellence.