TL;DR
Research Scientist (AI): Designing, implementing, and deploying cutting-edge research in reinforcement learning and post-training at scale, focusing on driving innovations that make it into production. Focus on post-training large models to align them with human intent and enable general capabilities such as reasoning, pushing the boundaries of model performance, safety, and efficiency.
Location: Hybrid work schedule, with team members coming into the office twice a week in Berlin, Cologne, Hamburg, Munich, or London.
Company
hirify.global is a global communications platform powered by Language AI, offering human-sounding translations and intelligent writing suggestions designed with enterprise security in mind.
What you will do
- Build and deploy state-of-the-art reinforcement learning pipelines at scale.
- Post-train large (multi-modal) models to align them with human intent and enable general capabilities such as reasoning, pushing the boundaries of model performance, safety, and efficiency.
- Keep the entire lifecycle of research and production in mind: from idea conception, theoretical modeling, prototyping, ablation studies, all the way to production deployment.
- Build and foster external collaborations with academic and industrial partners.
- Follow scientific and technical standards for experimentation, reproducibility, and model evaluation.
- Collaborate hirify.globaly with Engineering, ML Platform, and HPC teams to deliver robust and reliable model updates to users.
Requirements
- A solid mathematical background and enjoy solving challenging problems, evidenced by a masters degree, diploma, PhD, or equivalent industry experience in mathematics, physics, computer science, or a related field.
- Deep practical experience in Python and at least one modern machine learning framework such as PyTorch, TensorFlow, or JAX, experience working with large compute clusters and ML infrastructure is a plus.
- A track record of leading self-directed research projects that go well beyond academic exercises and deliver tangible results.
- Expertise in deep reinforcement learning (RLHF/RLAIF/RLVR) is a plus.
- Hands-on experience scaling and deploying LLMs or other foundation models in real-world systems is a plus.
Culture & Benefits
- Diverse and internationally distributed team with people of more than 90 nationalities.
- Open communication, regular feedback, smooth collaboration, direct and actionable feedback, and believe that leading with empathy and growth mindset makes us better together.
- Hybrid work schedule, with team members coming into the office twice a week.
- Flexible working hours.
- 30 days of annual leave (excluding public holidays) and access to mental health resources.
- Virtual Shares.
Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →