Назад
Company hidden
1 месяц назад

ML Infrastructure Engineer (AI)

Формат работы
remote (только Europe/United_states)
Тип работы
fulltime
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

ML Infrastructure Engineer (AI/GPU): Lead and support benchmarking of GPU platforms for machine learning and AI workloads with an accent on performance profiling, kernel-level analysis, and hardware optimization. Focus on evaluating architectures and software stacks, debugging bottlenecks, performing acceptance testing, and developing tools for visualization and data-driven decisions.

Location: Remote from Europe or United States. Applicants must be authorized to work in the country in which they apply and provide proof of employment eligibility.

Company

hirify.global is building a full-stack AI cloud platform supporting developers and enterprises from data and model training to production deployment.

What you will do

  • Profile and analyze GPU performance at system and kernel level in collaboration with hardware and development teams.
  • Evaluate and compare GPU performance across platforms, architectures, and software stacks like CUDA and ROCm.
  • Debug and optimize ML workloads on GPU hardware, resolving performance bottlenecks.
  • Conduct acceptance testing for new GPU clusters to ensure performance, stability, and compatibility for AI workloads.
  • Run experiments on diverse GPU configurations to assess interconnect strategies and system optimizations.
  • Develop tools and dashboards to visualize performance metrics, bottlenecks, and trends.
  • Contribute to internal tooling, frameworks, and best practices.

Requirements

  • Profound understanding of machine learning theoretical foundations
  • Deep understanding of performance aspects of large neural networks training and inference (data/tensor parallelism, offloading, custom kernels, hardware features, attention optimizations, dynamic batching)
  • Deep experience with modern deep learning frameworks (PyTorch, JAX, Megatron-LM, Tensor-LLM)
  • Good understanding of GPU stack: CUDA, NCCL, drivers, relevant libraries
  • Familiarity with containerized environments (Docker, Kubernetes)
  • Strong communication skills and ability to work independently

Nice to have

  • Familiarity with modern LLM inference frameworks (vLLM, SGLang, TensorRT)
  • Experience in Python and performance profiling tools (Nsight, nvprof, perf)
  • Familiarity with cloud ML platforms (AWS, GCP, Azure ML)
  • Contributions to open-source ML benchmarking tools

Culture & Benefits

  • Competitive compensation
  • Career growth and learning opportunities
  • Flexibility and work-life balance
  • Collaborative and innovative culture
  • Opportunity to work on impactful AI projects
  • International environment with talented teams
  • Fast-moving environment with bold thinking, constant growth, trust, ownership, and impact

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →