Назад
Company hidden
21 час назад

AI Research Engineer (LLM Inference)

Формат работы
hybrid
Тип работы
fulltime
Английский
b2
Страна
France
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

AI Research Engineer (LLM Inference): Designing and running experiments to understand how model architecture decisions propagate into LLM inference behavior, morphing open-weight models into architecture variants optimized for speed, and turning results into measurable gains in generation speed and model quality with an accent on inference-aware architecture research under hardware and distributed communication constraints. Focus on scaling MoE inference, owning the post-training pipeline (fine-tuning/evaluation/adaptation), and writing up findings for top venues and conferences.

Location: Hybrid (at least 50% of time in Paris office), Paris, France

Company

hirify.global builds an LLM inference engine optimized for high-throughput generation on standard datacenter GPUs.

What you will do

  • Design new model architecture variants (routing strategies, attention mechanisms, MoE structure) using execution constraints as a first-order input.
  • Extend the Laneformer thesis by exploring inference-aware architectural variants (e.g., DTP, Ladder Residual, PT-Transformer) and identifying what compounds at scale.
  • Own the post-training pipeline across fine-tuning, evaluation methodology, and adaptation of open-weight models toward inference-speed-optimized architecture variants.
  • Scale the stack to large MoE models (e.g., DeepSeek v4, Qwen 3), working through routing, expert parallelism, and inference-time communication patterns.
  • Write research papers, submit to top venues, and present at conferences.
  • Contribute to building AI agents that autonomously run architecture research and training experiments.

Requirements

  • Experience with complex AI problems and evidence of serious technical thinking (paper, repository, thesis, or equivalent technical work).
  • Strong understanding of Transformers and MoE, with enough depth to reason across trade-offs (including how communication structure and layer dependencies affect inference behavior).
  • Experience adapting or modifying existing model architectures and producing concrete results.
  • Comfort working at the intersection of model design and hardware constraints.
  • Ability to work in a hybrid setup with at least 50% of time in the Paris office.

Nice to have

  • Experience with post-training methods such as fine-tuning, preference optimization, or quantization.
  • Experience with production-scale exposure (not required).

Culture & Benefits

  • Direct access to AMD and NVIDIA datacenter GPUs from day one.
  • Small team where creativity and technical judgment directly influence key decisions.
  • Work focuses on the critical path of model execution speed and its impact on system capabilities.
  • Remote-friendly working model while requiring at least 50% time in the Paris office.

Hiring process

  • Review of technical evidence (papers, repositories, theses, or equivalent projects) and discussion of relevant research/engineering work.
  • Interviews focused on architecture/inference reasoning and experimentation approach.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →