Назад
Company hidden
5 дней назад

Member of Engineering (Scalability, AI)

Формат работы
remote (только Europe/United_states)
Тип работы
fulltime
Английский
b2
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Member of Engineering (Scalability) (AI/LLMs): Building distributed training and inference infrastructure for Large Language Models with an accent on software reliability, fault tolerance, and hardware fault detection. Focus on cross-platform checkpointing, NCCL recovery, minimizing GPU idle time during faults, and developing tools for training recovery.

Location: Remote (EMEA/East Coast). Monthly in-person collaboration in Paris (Mon-Wed, optional).

Company

hirify.global aims to reach AGI by accelerating software development with agentic AI systems and frontier models deployed into enterprise development environments.

What you will do

  • Identify, study, and troubleshoot hardware problems during large-scale training.
  • Minimize GPU idle time during faults operationally and strategically.
  • Design and develop tools and add-ons to accelerate training recovery.
  • Improve performance and reliability of checkpointing.
  • Write high-quality Python (PyTorch), Cython, C/C++, and CUDA code.

Requirements

  • Understanding of LLMs, Transformers, and deep learning fundamentals.
  • Strong engineering background with Linux API/kernel experience.
  • Programming: Python (numpy, PyTorch/Jax), C/C++, NCCL, strong algorithms.
  • Distributed systems: reliability, observability, fault-tolerance, K8s.
  • Fast learner ready for steep curve, modern tools, critical thinking.

Culture & Benefits

  • Fully remote with flexible hours.
  • 37 days/year vacation & holidays.
  • Health insurance allowance for you & dependents.
  • Company equipment, well-being/learning/home office allowances.
  • Frequent team get-togethers, diverse inclusive culture.

Hiring process

  • Intro call with Founding Engineer.
  • Technical interview(s) with Founding Engineer.
  • Team fit call with People team.
  • Final interview with Founding Engineer.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →