TL;DR

Research Scientist (AI): Designing, implementing, and deploying cutting-edge research in reinforcement learning and post-training at scale, focusing on driving innovations that make it into production. Focus on post-training large models to align them with human intent and enable general capabilities such as reasoning, pushing the boundaries of model performance, safety, and efficiency.

Location: Hybrid work schedule, with team members coming into the office twice a week in Berlin, Cologne, Hamburg, Munich, or London.

Company

%hirify_global% is a global communications platform powered by Language AI, offering human-sounding translations and intelligent writing suggestions designed with enterprise security in mind.

What you will do

Build and deploy state-of-the-art reinforcement learning pipelines at scale.
Post-train large (multi-modal) models to align them with human intent and enable general capabilities such as reasoning, pushing the boundaries of model performance, safety, and efficiency.
Keep the entire lifecycle of research and production in mind: from idea conception, theoretical modeling, prototyping, ablation studies, all the way to production deployment.
Build and foster external collaborations with academic and industrial partners.
Follow scientific and technical standards for experimentation, reproducibility, and model evaluation.
Collaborate %hirify_global%y with Engineering, ML Platform, and HPC teams to deliver robust and reliable model updates to users.

Requirements

A solid mathematical background and enjoy solving challenging problems, evidenced by a masters degree, diploma, PhD, or equivalent industry experience in mathematics, physics, computer science, or a related field.
Deep practical experience in Python and at least one modern machine learning framework such as PyTorch, TensorFlow, or JAX, experience working with large compute clusters and ML infrastructure is a plus.
A track record of leading self-directed research projects that go well beyond academic exercises and deliver tangible results.
Expertise in deep reinforcement learning (RLHF/RLAIF/RLVR) is a plus.
Hands-on experience scaling and deploying LLMs or other foundation models in real-world systems is a plus.

Culture & Benefits

Diverse and internationally distributed team with people of more than 90 nationalities.
Open communication, regular feedback, smooth collaboration, direct and actionable feedback, and believe that leading with empathy and growth mindset makes us better together.
Hybrid work schedule, with team members coming into the office twice a week.
Flexible working hours.
30 days of annual leave (excluding public holidays) and access to mental health resources.
Virtual Shares.