2 месяца назад

Staff Software Engineer (AI Runtime)

190 000 - 265 000$

Формат работы

onsite

Тип работы

fulltime

Грейд

senior

Английский

Страна

Вакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Staff Software Engineer (AI Runtime): Building and scaling the managed GPU training platform for large-scale AI models with an accent on distributed training performance and fault tolerance. Focus on designing multi-node orchestration, optimizing GPU efficiency, and developing resilience foundations for frontier-scale foundation models.

Location: Mountain View, California or San Francisco, California

Salary: $190,000 — $265,000 USD

Company

Databricks is a data and AI company providing a Data Intelligence Platform that unifies data, analytics, and AI for over 10,000 organizations worldwide.

What you will do

Drive the architecture and evolution of the AI Runtime (AIR) managed GPU training platform for scalable, high-throughput training.
Solve complex problems in multi-node orchestration, distributed parallelism strategies, and GPU scheduling.
Optimize GPU efficiency and training performance to raise utilization and lower cost per training run.
Build resilience and observability foundations to detect and recover from hardware and software failures.
Partner with product and research teams to shape APIs, CLI, and the developer experience for production training jobs.
Mentor senior engineers and champion engineering excellence to shape the long-term technical direction of AI training infrastructure.

Requirements

10+ years of experience building and operating large-scale distributed systems, GPU training infrastructure, or ML systems.
Hands-on experience with distributed training frameworks such as PyTorch, FSDP, DeepSpeed, or Megatron.
Deep understanding of parallelism strategies (data, tensor, pipeline, and sequence parallelism).
Strong grasp of GPU performance fundamentals, including NVLink, InfiniBand, and collective communication.
Experience building managed, multi-tenant cloud platform products with clear SLAs and SLOs.
BS in Computer Science or a related field (MS or PhD preferred).

Culture & Benefits

Comprehensive benefits and perks tailored to the employee's region.
Opportunity to work on the most demanding workloads in computing, including frontier-scale foundation models.
Collaborative environment partnering across product, research, and platform teams.
Commitment to diversity, inclusion, and equal employment opportunity standards.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Похожие вакансии

Staff Software Engineer (AI Runtime)

Databricks

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Culture & Benefits

Похожие вакансии

Senior Staff Software Engineer (AI)

Software Development Engineer (AI)

Senior Software Development Engineer (AI)

Staff Applied Machine Learning Engineer (Fraud & Abuse)

Staff Software Engineer (Machine Learning)