4 часа назад

Senior Machine Learning Infrastructure Engineer (AI)

Формат работы

hybrid

Тип работы

fulltime

Грейд

senior

Английский

Страна

Вакансия из Telegram канала -

Мэтч & Сопровод

Покажет вашу совместимость и напишет письмо

Описание вакансии

Machine Learning Infrastructure Engineer

Company

TRM Labs

Conditions

1 day agoSenior San Francisco, USA Hybrid Full Time Ai Jobs by TRM Labs

Skills

Autoscaling Parallel Computing Onnx Runtime Tensorrt Performance Engineering Gpu Distributed Systems Inference Cuda Observability Aws Gcp Vllm Triton Batching Model Parallelism Gpu Cluster Kubernetes Tensor Parallelism Flashattention Token Throughput

About the Role

You will design, build, and operate GPU-backed infrastructure to run production ML and LLM workloads. You will optimize inference systems for throughput and cost, implement model optimization and compilation workflows, and support distributed inference patterns such as model and tensor parallelism. You will schedule heterogeneous workloads across accelerators, instrument systems for GPU load, memory, batching, and token throughput, and work with engineering and ML teams to transition models from experimentation to reliable production services.

Requirements

Bachelor's degree or equivalent in Computer Science or related field
5+ years of experience building and operating distributed systems or infrastructure in production
Experience deploying and operating ML/LLM inference workloads on GPU clusters in cloud environments (AWS and/or GCP)
Deep understanding of high-throughput inference systems including batching strategies and token throughput optimization
Experience with ML serving frameworks such as Triton Inference Server, vLLM, Ray Serve, ONNX Runtime, or HuggingFace Optimum
Experience optimizing GPU load, memory efficiency, and production performance bottlenecks
Familiarity with distributed inference strategies including model parallelism and tensor parallelism
Experience working with Kubernetes or equivalent orchestration systems
Familiarity with heterogeneous accelerators (e.g., Inferentia) is a plus
CUDA familiarity and experience debugging GPU-related issues is a plus
Adaptable and autonomous with excellent communication and collaboration skills

Responsibilities

Design and operate GPU cluster infrastructure
Optimize high-throughput inference
Enable distributed inference strategies
Implement model optimization and compilation workflows
Schedule heterogeneous workloads
Build observability into ML infrastructure
Partner across engineering teams to transition models to production

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник -