TL;DR

Senior Research Scientist (Language AI): Designing, implementing, and deploying cutting-edge research in reinforcement learning and post-training for large language models with an accent on aligning models with human intent and enabling general capabilities. Focus on building and deploying state-of-the-art reinforcement learning pipelines at scale and driving innovations into production for %hirify_global%'s post-training stack.

Location: Hybrid in Berlin, Cologne, Hamburg, Munich, or London (office attendance required twice a week)

Company

%hirify_global% is a global communications platform powered by Language AI, focused on breaking down language barriers with human-sounding translations and intelligent writing suggestions for over 100,000 businesses worldwide.

What you will do

Design, implement, and deploy cutting-edge research in reinforcement learning and post-training at scale.
Build and deploy state-of-the-art reinforcement learning pipelines.
Post-train large (multi-modal) models to align with human intent and enable reasoning capabilities.
Manage the entire research and production lifecycle from idea conception to production deployment.
Foster external collaborations with academic and industrial partners.
Collaborate with Engineering, ML Platform, and HPC teams to deliver robust model updates.

Requirements

Deep technical background, strong leadership skills, and a proven track record in reinforcement learning or large-scale model alignment to production.
Strong practical background, creative mindset, and passion for solving hard problems with real-world impact.
Solid mathematical background (Master's, PhD, or equivalent industry experience in mathematics, physics, computer science, or related field).
Deep practical experience in Python and at least one modern machine learning framework (PyTorch, TensorFlow, or JAX).
Track record of leading self-directed research projects that deliver tangible results.
Hybrid work schedule, with team members coming into the office twice a week in Berlin, Cologne, Hamburg, Munich, or London.

Nice to have

Experience working with large compute clusters and ML infrastructure.
Expertise in deep reinforcement learning (RLHF/RLAIF/RLVR).
Hands-on experience scaling and deploying LLMs or other foundation models in real-world systems.

Culture & Benefits

Diverse and internationally distributed team (90+ nationalities).
Open communication, regular feedback, and a culture valuing empathy and growth mindset.
Flexible working hours and trust in productivity.
Monthly full-day hacking sessions ("Hack Fridays").
30 days of annual leave and access to mental health resources.
Competitive, location-tailored benefits package.
Virtual Shares, linking employee contribution to %hirify_global%’s growth.