Мэтч & Сопровод
Покажет вашу совместимость и напишет письмо
Описание вакансии
Machine Learning Infrastructure Engineer
Company
TRM Labs
Conditions
1 day agoSenior San Francisco, USA Hybrid Full Time Ai Jobs by TRM Labs
Skills
Autoscaling Parallel Computing Onnx Runtime Tensorrt Performance Engineering Gpu Distributed Systems Inference Cuda Observability Aws Gcp Vllm Triton Batching Model Parallelism Gpu Cluster Kubernetes Tensor Parallelism Flashattention Token Throughput
About the Role
You will design, build, and operate GPU-backed infrastructure to run production ML and LLM workloads. You will optimize inference systems for throughput and cost, implement model optimization and compilation workflows, and support distributed inference patterns such as model and tensor parallelism. You will schedule heterogeneous workloads across accelerators, instrument systems for GPU load, memory, batching, and token throughput, and work with engineering and ML teams to transition models from experimentation to reliable production services.
Requirements
- Bachelor's degree or equivalent in Computer Science or related field
- 5+ years of experience building and operating distributed systems or infrastructure in production
- Experience deploying and operating ML/LLM inference workloads on GPU clusters in cloud environments (AWS and/or GCP)
- Deep understanding of high-throughput inference systems including batching strategies and token throughput optimization
- Experience with ML serving frameworks such as Triton Inference Server, vLLM, Ray Serve, ONNX Runtime, or HuggingFace Optimum
- Experience optimizing GPU load, memory efficiency, and production performance bottlenecks
- Familiarity with distributed inference strategies including model parallelism and tensor parallelism
- Experience working with Kubernetes or equivalent orchestration systems
- Familiarity with heterogeneous accelerators (e.g., Inferentia) is a plus
- CUDA familiarity and experience debugging GPU-related issues is a plus
- Adaptable and autonomous with excellent communication and collaboration skills
Responsibilities
- Design and operate GPU cluster infrastructure
- Optimize high-throughput inference
- Enable distributed inference strategies
- Implement model optimization and compilation workflows
- Schedule heterogeneous workloads
- Build observability into ML infrastructure
- Partner across engineering teams to transition models to production
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →
Текст вакансии взят без изменений