ML Engineer (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
ML Engineer (AI/LLMs): Training, post-training, and evaluating core LLMs for an AI safety platform with an accent on SFT, RLHF, and DPO-style alignment. Focus on building large-scale data pipelines, distributed multi-GPU training, and optimizing production inference.
Location: Hybrid in Paris or London (Relocation package available for Paris only)
Compensation: $120K – $250K + Equity
Company
AI safety company building a reliability and optimization layer for AI systems using natural-language policies to enforce model behavior.
What you will do
- Train and post-train LLMs using SFT, RLHF, DPO, and related alignment methods.
- Build reward models based on human and synthetic preference data.
- Design and manage high-throughput data pipelines for collection, filtering, and quality control at scale.
- Execute distributed training on multi-GPU clusters and debug performance issues.
- Develop evaluation systems and benchmarks to drive training decisions.
- Optimize models for production inference using quantization, speculative decoding, and vLLM/TensorRT.
Requirements
- Hands-on experience with modern LLM post-training (SFT, RLHF, DPO) on self-trained models.
- Experience building large-scale data pipelines for training corpora and synthetic data.
- Proficiency in PyTorch or JAX with experience in distributed multi-GPU training.
- Deep understanding of model evaluation and the ability to build reliable benchmarks.
- Experience with inference optimization tools like vLLM, TensorRT, or Triton.
- Must be based in or able to relocate to Paris or London.
Nice to have
- Public builder footprint (open-source models, datasets, or papers on HuggingFace/GitHub).
- Experience at frontier or near-frontier AI labs.
- Knowledge of advanced RL methods for LLMs (e.g., online RL, GRPO-style methods).
- Experience with large-scale moderation, safety, or classification models.
- Experience in multilingual model training.
Culture & Benefits
- Paid time off in accordance with local regulations.
- Comprehensive medical insurance for the France-based team.
- Relocation support available for candidates moving to Paris.
- Full provision of necessary hardware, AI agent subscriptions, and IDEs.
- Bi-annual team off-sites in diverse locations.
Hiring process
- Introductory call with HR (25 min).
- Take-home technical test task.
- Technical interview with the Head of Applied Research (60 min).
- Final conversation with the CEO (45 min).
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →