Software Engineer (Machine Learning Infrastructure)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Software Engineer (ML Infrastructure): Design, build, and operate foundational systems for large-scale machine learning training, serving, and deployment at Slack with an accent on distributed systems, GPU infrastructure, and modern ML stacks. Focus on architecting scalable model inference, optimizing high-throughput workloads, and ensuring reliability for AI-driven capabilities across the company.
Location: Washington - Seattle, Texas - Austin, Georgia - Atlanta, Washington - Bellevue
Company
Slack AI, part of , builds AI-powered features to transform workflows by unlocking knowledge and reducing noise in Slack.
What you will do
- Design, build, and operate systems for training, serving, and deploying ML models at scale with focus on reliability and performance
- Evolve GPU-backed inference infrastructure for high-throughput, low-latency workloads including large-scale model serving
- Architect distributed training and data processing using Ray, Airflow, Spark, or similar
- Build Kubernetes-based platforms with KubeRay, vLLM, and internal services
- Develop monitoring, observability, and alerting for production ML workloads
- Partner with AI Platform, ML modeling, security, and product teams on infrastructure for evolving AI use cases
- Provide technical leadership through design reviews, mentorship, and architecture direction
Requirements
- Significant experience in software engineering focused on infrastructure, backend, platform engineering, or MLOps
- Deep expertise in distributed systems and Kubernetes/container platforms
- Hands-on with ML infrastructure stacks like Ray, KubeRay, vLLM
- Experience with GPU infrastructure optimization and management at scale
- Strong knowledge of data orchestration like Airflow, Spark
- Cloud-native systems on AWS, GCP, or Azure with infrastructure as code
- Ability to drive technical direction balancing short- and long-term goals
- Excellent written communication for asynchronous, global team
- Related technical degree
Culture & Benefits
- Work in a globally distributed infrastructure team
- Thrive in asynchronous communication environment
- Contribute to engineering blog posts and thought leadership
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →