Назад
Company hidden
2 дня назад

Senior Engineering Manager, AI Runtime (AI)

228Β 600 - 297Β 120$
Π€ΠΎΡ€ΠΌΠ°Ρ‚ Ρ€Π°Π±ΠΎΡ‚Ρ‹
onsite
Π’ΠΈΠΏ Ρ€Π°Π±ΠΎΡ‚Ρ‹
fulltime
Π“Ρ€Π΅ΠΉΠ΄
lead
Английский
b2
Π‘Ρ‚Ρ€Π°Π½Π°
US
Вакансия ΠΈΠ· списка Hirify.GlobalВакансия ΠΈΠ· Hirify RU Global, списка ΠΊΠΎΠΌΠΏΠ°Π½ΠΈΠΉ с восточно-СвропСйскими корнями
Для мэтча ΠΈ ΠΎΡ‚ΠΊΠ»ΠΈΠΊΠ° Π½ΡƒΠΆΠ΅Π½ Plus

ΠœΡΡ‚Ρ‡ & Π‘ΠΎΠΏΡ€ΠΎΠ²ΠΎΠ΄

Для мэтча с этой вакансиСй Π½ΡƒΠΆΠ΅Π½ Plus

ОписаниС вакансии

ВСкст:
/

TL;DR

Senior Engineering Manager, AI Runtime (AI): Leading a team responsible for the Custom Training product and its foundational infrastructure with an accent on distributed training orchestration, cluster lifecycle, and training efficiency. Focus on architectural decisions and product design for managed GPU training at scale.

Location: Mountain View, California; San Francisco, California

Salary: $228,600 β€” $297,120 USD

Company

hirify.global provides a data and AI infrastructure platform that unifies and democratizes data, analytics, and AI for organizations worldwide.

What you will do

  • Lead, mentor, and grow a high-performing engineering team.
  • Define and own the product and technical roadmap for AIR.
  • Collaborate with product, research, platform, infrastructure teams, and customers to drive end-to-end delivery.
  • Drive architectural decisions and product design for managed GPU training at scale.
  • Build observability and reliability practices for long-running, multi-node training jobs.
  • Partner with recruiting to attract, hire, and develop top-tier engineering talent.

Requirements

  • 8+ years of software engineering experience, with 3+ years in engineering management.
  • Track record building and operating managed GPU training infrastructure at scale (100s/1000s GPUs).
  • Deep familiarity with distributed training frameworks (PyTorch, DeepSpeed, Composer, Megatron-LM) and parallelism strategies (FSDP, tensor/pipeline parallelism).
  • Experience with training resilience patterns: checkpointing, elastic training, and automated failure recovery for long-running jobs.
  • Understanding of GPU performance fundamentals including NCCL, interconnect topologies, and memory optimization.
  • Experience building platform products with clear SLAs where you've owned the customer experience.
  • Strong cross-functional leadership across platform, product, and research teams.
  • BS/MS in Computer Science, Electrical Engineering, or related technical field.

Culture & Benefits

  • Comprehensive benefits and perks to meet the needs of all employees.
  • Committed to fostering a diverse and inclusive culture where everyone can excel.
  • Hiring practices are inclusive and meet equal employment opportunity standards.

Π‘ΡƒΠ΄ΡŒΡ‚Π΅ остороТны: Ссли Ρ€Π°Π±ΠΎΡ‚ΠΎΠ΄Π°Ρ‚Π΅Π»ΡŒ просит Π²ΠΎΠΉΡ‚ΠΈ Π² ΠΈΡ… систСму, ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΡ iCloud/Google, ΠΏΡ€ΠΈΡΠ»Π°Ρ‚ΡŒ ΠΊΠΎΠ΄/ΠΏΠ°Ρ€ΠΎΠ»ΡŒ, Π·Π°ΠΏΡƒΡΡ‚ΠΈΡ‚ΡŒ ΠΊΠΎΠ΄/ПО, Π½Π΅ Π΄Π΅Π»Π°ΠΉΡ‚Π΅ этого - это мошСнники. ΠžΠ±ΡΠ·Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎ ΠΆΠΌΠΈΡ‚Π΅ "ΠŸΠΎΠΆΠ°Π»ΠΎΠ²Π°Ρ‚ΡŒΡΡ" ΠΈΠ»ΠΈ ΠΏΠΈΡˆΠΈΡ‚Π΅ Π² ΠΏΠΎΠ΄Π΄Π΅Ρ€ΠΆΠΊΡƒ. ΠŸΠΎΠ΄Ρ€ΠΎΠ±Π½Π΅Π΅ Π² Π³Π°ΠΉΠ΄Π΅ β†’