Research Scientist (AI Safety)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Research Scientist (AI Safety): Studying how LLM agents fail in the wild and eliciting unsafe behaviors through concrete experiments with an accent on agent misalignment and deception. Focus on developing automated audit agents, pressure-testing frontier models, and building empirical benchmarks for AI safety.
Location: Hybrid in Paris or London. Relocation package available for Paris only.
Salary: $150K – $250K + Equity
Company
AI Safety company building a reliability and optimization layer for AI systems using natural-language policies.
What you will do
- Own end-to-end research projects from hypothesis formulation to falsifiable experiments and results.
- Develop automated audit agents to discover and characterize suspect model behavior at scale.
- Study misalignment and bias in real user interactions to create shippable evals.
- Pressure-test frontier agents in high-stakes scenarios to identify failure modes.
- Perform white-box and black-box investigations of AI model failures.
- Publish findings in public blog posts and conference papers.
Requirements
- Proven track record of empirical research in agent behavior, model evaluation, or alignment.
- Strong ML engineering skills, including independent ability to build MVPs with fine-tuning and agent inference.
- Expertise in experimental design, including isolating failure modes and calibrating baselines.
- Ability to define and iterate experiments for vague behavioral questions.
- Fluent AI power-user experienced with frontier models and coding agents.
Nice to have
- Publications at A* venues (NeurIPS, ICML, ICLR, ACL).
- Depth in interpretability (NLAs, SAEs, persona vectors).
- MSc or PhD in ML, Computer Science, Physics, or related quantitative fields.
- AI safety fellowship (MATS, ASTRA, etc.).
Culture & Benefits
- Paid time off according to local regulations.
- Comprehensive medical insurance for France-based employees.
- Provision of all necessary hardware, tools, and AI agent/IDE subscriptions.
- Bi-annual team off-sites (e.g., Alps, Saint-Tropez).
Hiring process
- Introductory HR call (25 min).
- Take-home technical test.
- Technical interview with the Head of Fundamental Research (60 min).
- Final conversation with the CEO (45 min).
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →