2 месяца назад

Research Engineer, Model Evaluations (AI)

320 000 - 485 000$

Формат работы

hybrid

Тип работы

fulltime

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Research Engineer, Model Evaluations (AI): Designing and implementing evaluation frameworks to measure Claude's capabilities and personality with an accent on reliability, scalability, and defensible metrics. Focus on building distributed evaluation platforms, diagnosing training regressions, and developing robust metrics for AI safety and intelligence.

Location: Hybrid (Must be based in or near San Francisco, CA or New York City, NY); expect to be in the office at least 25% of the time.

Salary: $320,000 - $485,000 USD

Company

Anthropic is a public benefit corporation dedicated to creating reliable, interpretable, and steerable AI systems that are safe and beneficial for society.

What you will do

Design and execute evaluations for reasoning, agentic behavior, knowledge, and safety properties of Claude.
Build and harden a distributed evaluation execution platform to run reliably at scale during production RL training.
Develop and maintain dashboards for monitoring model health and detecting regressions during training.
Debug anomalous evaluation results under time pressure to distinguish between model changes and infrastructure issues.
Improve internal tooling, libraries, and workflows used by researchers to iterate on evaluations.
Conduct experiments to analyze how prompting, sampling, and scaffolding affect internal and industry benchmarks.

Requirements

Strong Python programming skills for production or research infrastructure.
Experience building or operating distributed systems, data pipelines, or reliable large-scale infrastructure.
Clear written and verbal communication skills for explaining technical results to non-specialists.
Ability to operate in an on-call or production-support capacity during live training runs.
Bachelor's degree or equivalent experience in a relevant field.
Must be based in the USA to comply with office attendance policies.

Nice to have

Hands-on experience with LLMs (prompting, sampling, scaffolding).
Track record of building trusted data visualization dashboards.
Experience developing robust evaluation metrics for language models.
Background in statistics, experimental design, or ML training infrastructure.
Experience with large-scale dataset sourcing, curation, and processing.

Culture & Benefits

Competitive compensation with optional equity donation matching.
Generous vacation and parental leave policies.
Flexible working hours and collaborative office spaces.
Visa sponsorship available for eligible candidates.
Strongly collaborative environment focused on "big science" and high-impact research.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →