Назад
Company hidden
11 часов назад

Staff Software Engineer (AI Research Infrastructure)

190 000 - 270 000$
Формат работы
onsite
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Staff Software Engineer (AI Research Infrastructure): Developing and running the research stack that powers large-scale AI training and inference experiments across thousands of GPUs with an accent on scheduling, orchestration, observation, and dev tooling. Focus on designing abstractions for rapid iteration, turning experimental workloads into robust pipelines, and influencing the long-term roadmap for research computation.

Location: New York City, New York; San Francisco, California

Salary: $190,000 — $270,000 USD

Company

hirify.global is the data and AI company relied on by over 10,000 organizations worldwide, including over 50% of the Fortune 500, to unify data, analytics, and AI.

What you will do

  • Design and implement infrastructure for large-scale experiments, data processing, and model training on HPC clusters, GPU fleets, or cloud systems.
  • Build abstractions for job submission, scheduling, and monitoring to enable researchers to run experiments in minutes.
  • Create tooling like experiment management systems, CI/testing for research code, and workflows to boost developer productivity.
  • Partner with research scientists, ML engineers, and platform teams to productionize experimental workloads.
  • Influence the roadmap for research computation and mentor other engineers on compute and AI systems.

Requirements

  • BS/MS or PhD in Computer Science or related field.
  • 5+ years of software engineering experience with large-scale distributed systems or infrastructure.
  • Deep experience building and operating distributed systems, data pipelines, or backend services involving GPUs, clusters, or cloud providers.
  • Proficiency in systems programming languages like C++, Rust, Go, Java, or Scala.
  • Experience with cluster schedulers, resource managers, or job orchestration systems (e.g., Kubernetes, Slurm, Ray).
  • Understanding of modern ML training and inference workflows (distributed training, model parallelism, fine-tuning).

Culture & Benefits

  • Comprehensive benefits tailored to employee needs in their region.
  • Eligibility for annual performance bonus and equity.
  • Commitment to diversity, inclusion, and equal employment opportunity.
  • Global offices with a focus on fostering an inclusive culture.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →