Назад
Company hidden
2 месяца назад

Software Engineer (Machine Learning Infrastructure)

Тип работы
fulltime
Грейд
senior/lead
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Software Engineer (ML Infrastructure): Design, build, and operate foundational systems for large-scale machine learning training, serving, and deployment at Slack with an accent on distributed systems, GPU infrastructure, and modern ML stacks. Focus on architecting scalable model inference, optimizing high-throughput workloads, and ensuring reliability for AI-driven capabilities across the company.

Location: Washington - Seattle, Texas - Austin, Georgia - Atlanta, Washington - Bellevue

Company

Slack AI, part of hirify.global, builds AI-powered features to transform workflows by unlocking knowledge and reducing noise in Slack.

What you will do

  • Design, build, and operate systems for training, serving, and deploying ML models at scale with focus on reliability and performance
  • Evolve GPU-backed inference infrastructure for high-throughput, low-latency workloads including large-scale model serving
  • Architect distributed training and data processing using Ray, Airflow, Spark, or similar
  • Build Kubernetes-based platforms with KubeRay, vLLM, and internal services
  • Develop monitoring, observability, and alerting for production ML workloads
  • Partner with AI Platform, ML modeling, security, and product teams on infrastructure for evolving AI use cases
  • Provide technical leadership through design reviews, mentorship, and architecture direction

Requirements

  • Significant experience in software engineering focused on infrastructure, backend, platform engineering, or MLOps
  • Deep expertise in distributed systems and Kubernetes/container platforms
  • Hands-on with ML infrastructure stacks like Ray, KubeRay, vLLM
  • Experience with GPU infrastructure optimization and management at scale
  • Strong knowledge of data orchestration like Airflow, Spark
  • Cloud-native systems on AWS, GCP, or Azure with infrastructure as code
  • Ability to drive technical direction balancing short- and long-term goals
  • Excellent written communication for asynchronous, global team
  • Related technical degree

Culture & Benefits

  • Work in a globally distributed infrastructure team
  • Thrive in asynchronous communication environment
  • Contribute to engineering blog posts and thought leadership

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →