RL Engineer (AI)

180 000 - 290 000$

Формат работы

remote (только USA)/hybrid

Тип работы

fulltime

Грейд

senior

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

RL Engineer (AI): Building training infrastructure and reward pipelines for web data extraction models with an accent on reinforcement learning and LLM agents. Focus on bridging classical RL with modern LLM systems and achieving SOTA results in structured data extraction.

Location: Must be based in the Americas (UTC-3 to UTC-10); Hybrid in San Francisco, CA or Remote.

Salary: $180,000 – $290,000 per year

Company

hirify.global is a high-growth AI startup providing an API to reliably convert URLs into LLM-ready markdown or structured data.

What you will do

Build training infrastructure, reward pipelines, and evaluation frameworks from scratch.
Fine-tune foundation models to achieve state-of-the-art results in web data extraction and content understanding.
Bridge classical RL techniques (PPO, RLHF) with modern LLM-based agent workflows.
Design and execute fast-paced experiments to test hypotheses and iterate on model performance.
Collaborate with research and engineering teams to integrate RL improvements into the product roadmap.

Requirements

3+ years of experience in applied RL, ML engineering, or model training with production systems.
Proven ability to build training loops, manage GPU clusters, and debug convergence issues.
Deep understanding of PPO, RLHF, reward modeling, and policy optimization.
Ability to translate complex RL metrics into clear, actionable insights for non-experts.
US Citizenship/Visa required for those working from the San Francisco office.
Must be located within the Americas time zones (UTC-3 to UTC-10).

Culture & Benefits

Competitive compensation with equity (up to 0.15%).
Generous PTO (15 days mandatory) and 12 weeks of fully paid parental leave.
Comprehensive health, dental, and vision insurance (100% coverage for employees).
Wellness stipend ($100/month) and professional development budget ($1,000/year).
Retirement planning via 401(k) and various supplemental insurance options.
Unique perks including team offsites and a 3-month paid sabbatical after 4 years.

Hiring process

Introductory chat to discuss background and goals.
Technical deep dive focusing on RL, model training, and a live problem-solving session.
Founder chat to evaluate culture fit and ownership mindset.
Paid work trial (1–2 weeks) tackling a real production RL/fine-tuning problem.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →