Назад
Company hidden
2 месяца назад

Machine Learning Engineer — Training Optimization (AI)

Формат работы
remote (Global)
Тип работы
fulltime
Английский
b2
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Machine Learning Engineer (AI): Optimizing large-scale model training pipelines for speed, stability, and cost with an accent on distributed training strategies and hardware efficiency. Focus on reducing training time, implementing advanced techniques like ZeRO and FSDP, and improving throughput for LLMs.

Location: Remote (World)

Company

hirify.global is a Series-A startup developing cutting-edge models and high-performance training systems.

What you will do

  • Optimize throughput, convergence, stability, and cost of large-scale model training pipelines.
  • Improve distributed training strategies utilizing data, model, and pipeline parallelism.
  • Tune optimizers, schedulers, batch sizing, and precision (bf16, fp16, fp8).
  • Profile and analyze system bottlenecks to reduce training time and compute costs.
  • Build and maintain robust infrastructure for checkpointing, fault tolerance, and reproducibility.
  • Evaluate and integrate advanced training techniques such as gradient checkpointing, ZeRO, and FSDP.

Requirements

  • Strong experience training LLMs or similarly large neural networks.
  • Hands-on expertise in training optimization.
  • Deep understanding of backpropagation, optimization algorithms, and training dynamics.
  • Proficiency with PyTorch.
  • Experience working with GPU hardware, memory, and networking constraints.
  • Ability to translate research ideas into production-ready code.

Nice to have

  • Familiarity with DeepSpeed, FSDP, Megatron, or custom training stacks.
  • Experience optimizing training on AMD or NVIDIA GPUs.
  • Contributions to open-source ML infrastructure or research codebases.
  • Exposure to non-Transformer architectures (e.g., RNNs, hybrid models).

Culture & Benefits

  • Real ownership and impact at a Series-A stage company.
  • Opportunity to work on cutting-edge models with a small, highly technical team.
  • Fast feedback loops and a strong emphasis on engineering quality and research rigor.
  • Competitive compensation package including meaningful equity.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →