TL;DR
Research Engineer, Machine Learning (Reinforcement Learning): Collaborating with researchers and engineers to advance the capabilities and safety of large language models with an accent on implementing novel approaches and contributing to research direction. Focus on fundamental research in reinforcement learning, creating 'agentic' models via tool use, and improving reasoning abilities in areas like mathematics.
Location: London, UK. This is a hybrid role requiring staff to be in one of the offices at least 25% of the time. Visa sponsorship is available.
Company
hirify.global is a public benefit corporation focused on creating reliable, interpretable, and steerable AI systems for societal benefit.
What you will do
- Lead reinforcement learning research and development for hirify.global's AI systems.
- Develop systems enabling models to effectively use computers and advance code generation.
- Pioneer fundamental RL research for large language models, improving model reasoning.
- Architect and optimize core RL infrastructure and distributed experiment management across GPU clusters.
- Design, implement, and test novel training environments, evaluations, and methodologies for RL agents.
- Drive performance improvements through profiling, optimization, and debugging distributed systems.
Requirements
- Proficiency in Python and async/concurrent programming (e.g., Trio).
- Experience with machine learning frameworks (PyTorch, TensorFlow, JAX).
- Industry experience in machine learning research.
- Ability to balance research exploration with engineering implementation.
- Strong systems design and communication skills.
- Passion for the potential impact of AI and commitment to developing safe and beneficial systems.
Nice to have
- Familiarity with LLM architectures and training methodologies.
- Experience with reinforcement learning techniques and environments.
- Experience with virtualization, sandboxed code execution, or Kubernetes.
- Experience with distributed systems or high-performance computing.
- Experience with Rust and/or C++.
Culture & Benefits
- Work as a single cohesive team on a few large-scale research efforts.
- Focus on impact: advancing long-term goals of steerable, trustworthy AI.
- Extremely collaborative group with frequent research discussions.
- Competitive compensation and benefits, optional equity donation matching.
- Generous vacation and parental leave, flexible working hours.
- Lovely office space in London for collaboration.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →