Software Engineer (Multiple Levels) (Machine Learning Infrastructure)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Software Engineer (Multiple Levels) (Machine Learning Infrastructure): Design, build, and operate systems to train, serve, and deploy machine learning models at scale with an accent on reliability, performance, and operational simplicity. Focus on distributed training and data processing, GPU-backed inference at high throughput/low latency, and building Kubernetes-based orchestration platforms for production ML workloads.
Location: Washington - Seattle
Salary: $148,500 - $313,700 annually (base salary)
Company
AI builds an AI-powered operating system to transform how people work inside .
What you will do
- Design, build, and operate systems to train, serve, and deploy machine learning models at scale.
- Evolve GPU-backed inference infrastructure for high-throughput, latency-sensitive workloads.
- Architect and optimize distributed training and data processing using platforms such as Ray, Airflow, Spark, or similar.
- Build and maintain Kubernetes-based platforms and orchestration layers using KubeRay, vLLM, and related services.
- Implement monitoring, observability, and alerting for production ML workloads.
- Provide technical leadership via design reviews, mentorship, and engineering standards; author architecture documentation.
Requirements
- Significant professional software engineering experience focused on infrastructure, backend systems, platform engineering, or MLOps.
- Deep experience with distributed systems and expert-level knowledge of Kubernetes and container-based platforms.
- Hands-on experience with modern ML infrastructure and serving stacks such as Ray/KubeRay and vLLM (or similar).
- Experience with GPU infrastructure performance optimization and operational management at scale.
- Experience with data infrastructure and orchestration technologies such as Airflow and Spark (or similar).
- Experience building and operating cloud-native systems on AWS, GCP, or Azure, including infrastructure as code; a related technical degree required.
Culture & Benefits
- Benefits include time off programs, medical, dental, vision, mental health support, paid parental leave, life and disability insurance, 401(k), and an employee stock purchasing program.
- Compensation is determined by location, job level, and experience; base salary range provided for this position.
- Work in an asynchronous, globally distributed infrastructure team.
Hiring process
- Recruiting and resume assessment may use AI tools; final selection and hiring decisions are made by humans.
- Interviews and evaluation focus on engineering experience, infrastructure/ML systems expertise, and technical communication.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →