AI Research Engineer (LLM Inference)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
AI Research Engineer (LLM Inference): Designing and running experiments to understand how model architecture decisions propagate into LLM inference behavior, morphing open-weight models into architecture variants optimized for speed, and turning results into measurable gains in generation speed and model quality with an accent on inference-aware architecture research under hardware and distributed communication constraints. Focus on scaling MoE inference, owning the post-training pipeline (fine-tuning/evaluation/adaptation), and writing up findings for top venues and conferences.
Location: Hybrid (at least 50% of time in Paris office), Paris, France
Company
builds an LLM inference engine optimized for high-throughput generation on standard datacenter GPUs.
What you will do
- Design new model architecture variants (routing strategies, attention mechanisms, MoE structure) using execution constraints as a first-order input.
- Extend the Laneformer thesis by exploring inference-aware architectural variants (e.g., DTP, Ladder Residual, PT-Transformer) and identifying what compounds at scale.
- Own the post-training pipeline across fine-tuning, evaluation methodology, and adaptation of open-weight models toward inference-speed-optimized architecture variants.
- Scale the stack to large MoE models (e.g., DeepSeek v4, Qwen 3), working through routing, expert parallelism, and inference-time communication patterns.
- Write research papers, submit to top venues, and present at conferences.
- Contribute to building AI agents that autonomously run architecture research and training experiments.
Requirements
- Experience with complex AI problems and evidence of serious technical thinking (paper, repository, thesis, or equivalent technical work).
- Strong understanding of Transformers and MoE, with enough depth to reason across trade-offs (including how communication structure and layer dependencies affect inference behavior).
- Experience adapting or modifying existing model architectures and producing concrete results.
- Comfort working at the intersection of model design and hardware constraints.
- Ability to work in a hybrid setup with at least 50% of time in the Paris office.
Nice to have
- Experience with post-training methods such as fine-tuning, preference optimization, or quantization.
- Experience with production-scale exposure (not required).
Culture & Benefits
- Direct access to AMD and NVIDIA datacenter GPUs from day one.
- Small team where creativity and technical judgment directly influence key decisions.
- Work focuses on the critical path of model execution speed and its impact on system capabilities.
- Remote-friendly working model while requiring at least 50% time in the Paris office.
Hiring process
- Review of technical evidence (papers, repositories, theses, or equivalent projects) and discussion of relevant research/engineering work.
- Interviews focused on architecture/inference reasoning and experimentation approach.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →