Назад
Company hidden
5 часов назад

Research Engineer, RL Infrastructure and Reliability (Knowledge Work) (AI)

350 000 - 850 000$
Формат работы
hybrid
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Research Engineer, RL Infrastructure and Reliability (Knowledge Work): Own reliability, observability, and infrastructure foundation for Knowledge Work training environments and evaluations with an accent on proactive hardening, stress-testing at scale, and high-signal metrics. Focus on building stable systems, automating operational tooling, and driving incidents to resolution for partner teams.

Location: San Francisco, CA (hybrid policy: at least 25% time in office)

Salary: $350,000 - $850,000 USD

Company

Quickly growing AI research organization building reliable, interpretable, and steerable AI systems like Claude.

What you will do

  • Serve as dedicated reliability owner for Knowledge Work training environments, providing continuity and reducing operational overhead
  • Own clean, canonical evaluation tools and processes, including for model releases
  • Build and automate observability, dashboards, and tooling with emphasis on trusted metrics and alerts
  • Proactively harden systems via load testing, fault injection, and stress testing at realistic scale
  • Act as primary contact for partner teams on environment issues and drive incident resolution
  • Reduce operational burden on researchers to keep them focused on research

Requirements

  • Highly experienced Python engineer shipping reliable, well-instrumented production code
  • Demonstrated experience operating ML or distributed systems at scale, including on-call and incident response
  • Strong SRE or production-engineering mindset with SLOs, load tests, and failure injection
  • Foundational ML knowledge to understand training environments, evaluations, and integrity issues
  • Able to read research code and reason about evaluation integrity
  • Bachelor’s degree or equivalent in relevant field

Nice to have

  • 5+ years operating ML or distributed systems at scale
  • Experience with RL environments, agent harnesses, or LLM evaluation frameworks
  • Familiarity with reward modeling, evaluation design, or reward hacking mitigation
  • Experience with observability stacks, dashboard tooling, chaos engineering, or large-scale load testing
  • Background in data quality pipelines, drift detection, or evaluation curation
  • Familiarity with large-scale training/inference infrastructure
  • Prior role as reliability or operations owner in research team

Culture & Benefits

  • Collaborative team focused on high-impact AI research as big science
  • Competitive compensation, equity donation matching, generous vacation and parental leave
  • Flexible working hours and lovely office space in San Francisco
  • Visa sponsorship available (with reasonable effort and immigration lawyer support)
  • Emphasis on diverse perspectives and representation in AI development

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →