TL;DR
Research Engineer (Agentic Models): Designing, implementing, and maintaining SFT and RL post-training pipelines for multi-step coding agents with an accent on model adaptation for agent workflows and building evaluation environments. Focus on designing evaluation frameworks, analyzing results, and improving model architectures and datasets.
Location: Remote from Germany, or onsite in Netherlands, Serbia, Germany, Cyprus, United Kingdom, Czech Republic, Poland, or Armenia.
Company
hirify.global: Developing powerful and effective developer tools, increasingly integrating AI-powered assistance and agents into IDEs.
What you will do
- Design, implement, and maintain SFT and RL post-training pipelines for multi-step coding agents.
- Train and adapt LLMs for agent workflows, including planning and tool use within hirify.global IDEs.
- Build and develop evaluation and simulation environments for coding agents on realistic developer tasks.
- Design evaluation frameworks and metrics, analyze traces and logs, and close the loop from evaluation back into training and data.
- Analyze training and evaluation results to propose and implement improvements to model architectures and datasets.
- Work with large-scale infrastructure, including distributed GPU and MapReduce clusters.
Requirements
- Hands-on experience training LLMs (pre-training, fine-tuning, or post-training) in a research or production setting.
- Experience with a modern deep learning framework, such as PyTorch, and specialized LLM training stacks.
- Solid understanding of LLM training basics – tokenization, data pipelines, batching, mixed precision, distributed training.
- Ability to own projects end to end, overseeing design, experimentation, implementation, and iteration.
- A product-aware mindset, translating product needs and failure modes into modeling and evaluation work.
- At least 3 years of Python experience writing clean, maintainable code in modern ML codebases.
Nice to have
- Experience with ML orchestrators and workflow tools such as Kubeflow, Dagster, Airflow, or job schedulers like Kubernetes.
- Experience with large-scale data and training pipelines (MapReduce-style clusters, multi-node GPU training).
- Designing and maintaining evaluation pipelines for LLMs or agents, including metrics, dashboards, and automated regression checks.
- AI agent development, such as tool-using agents, planners, or multi-step coding workflows.
- Experiment tracking and observability using tools like Weights & Biases, MLflow, or similar.
- Inference optimization and serving optimized models in production.
Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →