RL Engineer (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
RL Engineer (AI): Building training infrastructure and reward pipelines for web data extraction models with an accent on reinforcement learning and LLM agents. Focus on bridging classical RL with modern LLM systems and achieving SOTA results in structured data extraction.
Location: Must be based in the Americas (UTC-3 to UTC-10); Hybrid in San Francisco, CA or Remote.
Salary: $180,000 – $290,000 per year
Company
is a high-growth AI startup providing an API to reliably convert URLs into LLM-ready markdown or structured data.
What you will do
- Build training infrastructure, reward pipelines, and evaluation frameworks from scratch.
- Fine-tune foundation models to achieve state-of-the-art results in web data extraction and content understanding.
- Bridge classical RL techniques (PPO, RLHF) with modern LLM-based agent workflows.
- Design and execute fast-paced experiments to test hypotheses and iterate on model performance.
- Collaborate with research and engineering teams to integrate RL improvements into the product roadmap.
Requirements
- 3+ years of experience in applied RL, ML engineering, or model training with production systems.
- Proven ability to build training loops, manage GPU clusters, and debug convergence issues.
- Deep understanding of PPO, RLHF, reward modeling, and policy optimization.
- Ability to translate complex RL metrics into clear, actionable insights for non-experts.
- US Citizenship/Visa required for those working from the San Francisco office.
- Must be located within the Americas time zones (UTC-3 to UTC-10).
Culture & Benefits
- Competitive compensation with equity (up to 0.15%).
- Generous PTO (15 days mandatory) and 12 weeks of fully paid parental leave.
- Comprehensive health, dental, and vision insurance (100% coverage for employees).
- Wellness stipend ($100/month) and professional development budget ($1,000/year).
- Retirement planning via 401(k) and various supplemental insurance options.
- Unique perks including team offsites and a 3-month paid sabbatical after 4 years.
Hiring process
- Introductory chat to discuss background and goals.
- Technical deep dive focusing on RL, model training, and a live problem-solving session.
- Founder chat to evaluate culture fit and ownership mindset.
- Paid work trial (1–2 weeks) tackling a real production RL/fine-tuning problem.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →