Назад
4 часа назад

Senior Machine Learning Infrastructure Engineer (AI)

Формат работы
hybrid
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
vacancy_detail.hirify_telegram_tooltipВакансия из Telegram канала -

Мэтч & Сопровод

Покажет вашу совместимость и напишет письмо

Описание вакансии

Machine Learning Infrastructure Engineer

Company

TRM Labs

Conditions

1 day agoSenior San Francisco, USA Hybrid Full Time Ai Jobs by TRM Labs

Skills

Autoscaling Parallel Computing Onnx Runtime Tensorrt Performance Engineering Gpu Distributed Systems Inference Cuda Observability Aws Gcp Vllm Triton Batching Model Parallelism Gpu Cluster Kubernetes Tensor Parallelism Flashattention Token Throughput

About the Role

You will design, build, and operate GPU-backed infrastructure to run production ML and LLM workloads. You will optimize inference systems for throughput and cost, implement model optimization and compilation workflows, and support distributed inference patterns such as model and tensor parallelism. You will schedule heterogeneous workloads across accelerators, instrument systems for GPU load, memory, batching, and token throughput, and work with engineering and ML teams to transition models from experimentation to reliable production services.

Requirements

  • Bachelor's degree or equivalent in Computer Science or related field
  • 5+ years of experience building and operating distributed systems or infrastructure in production
  • Experience deploying and operating ML/LLM inference workloads on GPU clusters in cloud environments (AWS and/or GCP)
  • Deep understanding of high-throughput inference systems including batching strategies and token throughput optimization
  • Experience with ML serving frameworks such as Triton Inference Server, vLLM, Ray Serve, ONNX Runtime, or HuggingFace Optimum
  • Experience optimizing GPU load, memory efficiency, and production performance bottlenecks
  • Familiarity with distributed inference strategies including model parallelism and tensor parallelism
  • Experience working with Kubernetes or equivalent orchestration systems
  • Familiarity with heterogeneous accelerators (e.g., Inferentia) is a plus
  • CUDA familiarity and experience debugging GPU-related issues is a plus
  • Adaptable and autonomous with excellent communication and collaboration skills

Responsibilities

  • Design and operate GPU cluster infrastructure
  • Optimize high-throughput inference
  • Enable distributed inference strategies
  • Implement model optimization and compilation workflows
  • Schedule heterogeneous workloads
  • Build observability into ML infrastructure
  • Partner across engineering teams to transition models to production

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник -