Назад
3 дня назад

Senior Software Engineer (AI)

160 000 - 225 000$
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior Software Engineer (AI Runtime): Building and scaling a managed GPU training platform for large-scale AI model training and fine-tuning with an accent on multi-node orchestration and distributed parallelism. Focus on optimizing training throughput, ensuring system resilience through failure detection, and maximizing GPU utilization across diverse hardware.

Location: Must be based in Mountain View or San Francisco, California

Salary: $160,000 — $225,000 USD

Company

Databricks is a data and AI company providing a Data Intelligence Platform used by over 10,000 organizations to unify data, analytics, and AI.

What you will do

  • Drive the architecture and evolution of the AI Runtime (AIR) managed GPU training platform for fleets of thousands of accelerators.
  • Solve complex challenges in multi-node orchestration, distributed parallelism strategies, and GPU scheduling.
  • Optimize GPU efficiency, increasing model FLOPs utilization and overall end-to-end throughput.
  • Develop resilience and observability foundations to detect and recover from hardware and software failures automatically.
  • Collaborate with product and research teams to design the APIs, CLI, and developer experience for production training jobs.
  • Mentor other engineers and lead end-to-end engineering efforts from design to production rollout.

Requirements

  • 5+ years of experience building large-scale distributed systems, GPU training infrastructure, or ML systems.
  • Proficiency with distributed training frameworks such as PyTorch, FSDP, DeepSpeed, or Megatron.
  • Deep understanding of GPU performance, including accelerator architecture, NVLink, InfiniBand, or RoCE.
  • Experience operating managed multi-tenant cloud platform products with strict SLAs and SLOs.
  • Strong foundation in algorithms, data structures, and performance-sensitive system design.
  • BS in Computer Science or a related field (MS or PhD preferred).

Culture & Benefits

  • Comprehensive benefits and perks tailored to the region.
  • Opportunity to work on frontier-scale foundation models and cutting-edge AI infrastructure.
  • Collaborative environment partnering across product, research, and platform teams.
  • Commitment to diversity, inclusion, and equal employment opportunity.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →