Назад
Company hidden
2 дня назад

Senior/Staff Software Engineer (ML Infrastructure)

Формат работы
onsite
Тип работы
fulltime
Грейд
senior/principal
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior/Staff Software Engineer (ML Infrastructure): Designing, building, and operating foundational systems for large-scale machine learning and AI at Slack with an accent on distributed model training, serving, and deployment. Focus on evolving GPU-backed inference infrastructure, optimizing data processing systems, and setting long-term architectural direction for ML infrastructure.

Location: Onsite in Seattle, Austin, Atlanta, or Bellevue, USA

Company

hirify.global's Slack AI team focuses on transforming how people work by making Slack an AI-powered operating system.

What you will do

  • Design, build, and operate scalable, reliable, and performant systems for ML model training, serving, and deployment.
  • Evolve GPU-backed inference infrastructure to support high-throughput, latency-sensitive AI workloads.
  • Architect and optimize distributed training and data processing systems using technologies like Ray, Airflow, or Spark.
  • Build and maintain Kubernetes-based platforms and orchestration layers, including tools like KubeRay and vLLM.
  • Architect solutions to bridge legacy systems with modern technologies while ensuring application stability.
  • Develop robust monitoring, observability, and alerting for production ML workloads.
  • Provide technical leadership through design reviews, mentorship, and by setting engineering standards.

Requirements

  • Significant professional experience in software engineering with a strong focus on infrastructure, backend systems, platform engineering, or MLOps.
  • Deep experience building and operating distributed systems, including expert-level knowledge of Kubernetes.
  • Hands-on experience with modern ML infrastructure and serving stacks (e.g., Ray, KubeRay, vLLM).
  • Experience working with GPU infrastructure, including performance optimization and operational management at scale.
  • Strong experience with data infrastructure and orchestration technologies (e.g., Airflow, Spark).
  • Experience building and operating cloud-native systems on public cloud platforms (AWS, GCP, Azure) and infrastructure as code.
  • Demonstrated ability to drive technical direction for complex systems and balance short-term delivery with long-term architectural goals.
  • Excellent written communication and ability to thrive in an asynchronous team environment.

Culture & Benefits

  • Join a team shaping the future of work by making Slack an AI-powered operating system.
  • Contribute to deep architectural decisions for large-scale, high-performance ML and AI systems.
  • Work on complex scalability and reliability challenges at the intersection of distributed systems and ML.
  • Opportunity to thrive in an asynchronous and globally distributed infrastructure team.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник - загрузка...