Senior Compute Platform Engineer (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior Compute Platform Engineer (Golang/Kubernetes): Designing and operating high-scale batch compute and workflow orchestration systems for autonomous trucking with an accent on Kubernetes clusters, resource optimization, and multi-tenant scheduling. Focus on improving reliability and fault tolerance of large-scale distributed jobs and platform abstractions.
Location: Pittsburgh, PA or Remote (Must be a U.S. person/citizen due to national security laws)
Company
develops autonomous technology and AI systems for the trucking transportation industry to enhance safety and efficiency.
What you will do
- Design and operate distributed systems for scheduling large-scale batch workloads across Kubernetes clusters.
- Build and maintain compute platform abstractions and optimize resource utilization.
- Develop multi-tenant scheduling strategies to improve system efficiency.
- Enhance the reliability and fault tolerance of large-scale distributed jobs and platform components.
- Collaborate with cross-functional teams to align platform capabilities with workload requirements.
- Contribute to platform tooling, automation, and CI/CD workflows.
Requirements
- 7+ years of experience building and operating distributed systems or infrastructure platforms.
- Strong production-grade experience with Kubernetes and container orchestration.
- Proficiency in Golang and Python.
- Experience designing and operating large-scale batch compute systems.
- Experience with batch scheduling systems such as Kueue, Armada, Volcano, or Slurm.
- Must comply with U.S. national security laws (U.S. person status and/or citizenship status required).
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →