Research Engineer, RL Infrastructure and Reliability (Knowledge Work) (AI)

350 000 - 850 000$

Формат работы

hybrid

Тип работы

fulltime

Грейд

senior

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Research Engineer, RL Infrastructure and Reliability (Knowledge Work): Own reliability, observability, and infrastructure foundation for Knowledge Work training environments and evaluations with an accent on proactive hardening, stress-testing at scale, and high-signal metrics. Focus on building stable systems, automating operational tooling, and driving incidents to resolution for partner teams.

Location: San Francisco, CA (hybrid policy: at least 25% time in office)

Salary: $350,000 - $850,000 USD

Company

Quickly growing AI research organization building reliable, interpretable, and steerable AI systems like Claude.

What you will do

Serve as dedicated reliability owner for Knowledge Work training environments, providing continuity and reducing operational overhead
Own clean, canonical evaluation tools and processes, including for model releases
Build and automate observability, dashboards, and tooling with emphasis on trusted metrics and alerts
Proactively harden systems via load testing, fault injection, and stress testing at realistic scale
Act as primary contact for partner teams on environment issues and drive incident resolution
Reduce operational burden on researchers to keep them focused on research

Requirements

Highly experienced Python engineer shipping reliable, well-instrumented production code
Demonstrated experience operating ML or distributed systems at scale, including on-call and incident response
Strong SRE or production-engineering mindset with SLOs, load tests, and failure injection
Foundational ML knowledge to understand training environments, evaluations, and integrity issues
Able to read research code and reason about evaluation integrity
Bachelor’s degree or equivalent in relevant field

Nice to have

5+ years operating ML or distributed systems at scale
Experience with RL environments, agent harnesses, or LLM evaluation frameworks
Familiarity with reward modeling, evaluation design, or reward hacking mitigation
Experience with observability stacks, dashboard tooling, chaos engineering, or large-scale load testing
Background in data quality pipelines, drift detection, or evaluation curation
Familiarity with large-scale training/inference infrastructure
Prior role as reliability or operations owner in research team

Culture & Benefits

Collaborative team focused on high-impact AI research as big science
Competitive compensation, equity donation matching, generous vacation and parental leave
Flexible working hours and lovely office space in San Francisco
Visa sponsorship available (with reasonable effort and immigration lawyer support)
Emphasis on diverse perspectives and representation in AI development

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Research Engineer, RL Infrastructure and Reliability (Knowledge Work) (AI)

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Nice to have

Culture & Benefits

Похожие вакансии

Senior Research Engineer (AI)

Senior Product Engineer (AI)

Researcher, Agentic Post-Training (AI)

Machine Learning Engineer (API Multicloud)

Research Engineer (AI)

Member Of Technical Staff (Multimodal AI)