2 дня назад

Software Engineer (Safeguards Evals)

320 000 - 485 000$

Формат работы

hybrid

Тип работы

fulltime

Грейд

senior

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Software Engineer (Safeguards Evals) (AI): Build evaluation infrastructure for an agentic investigation system that detects misuse of Claude, with an accent on long-horizon agent metrics, high-quality eval datasets from real traffic, and production-grade regression/release pipelines. Focus on measuring end-to-end detection and investigation quality, identifying coverage gaps, and constructing RL environments to improve safety investigation capabilities.

Location: San Francisco, CA | New York City, NY

Salary: $320,000 - $485,000 USD (annual)

Company

Anthropic builds reliable, interpretable, and steerable AI systems with a focus on safety and beneficial outcomes.

What you will do

Build and own the evaluation harness for an agentic investigation system, defining metrics, test cases, and grading approaches for long-horizon agents.
Construct high-quality eval datasets representing real-world misuse across harm areas using real traffic patterns and synthetic generation.
Measure agent performance end-to-end (precision/recall, investigation quality, robustness) and drive improvements on the hardest harm areas.
Analyze coverage to find measurement gaps and evolve evals to stay unsaturated and high-signal as capabilities advance.
Productionize research into regression and release pipelines that run on every agent change, prompt update, and underlying model upgrade.
Build tooling that lets policy experts author, run, and iterate on evaluations without engineering support; construct RL environments to improve safety investigation capabilities.

Requirements

Proficiency in Python and comfort working across the stack.
Experience building and maintaining data pipelines.
Experience working with LLMs and understanding capabilities and failure modes, especially agentic systems with tool use and multi-step reasoning.
Strong data analysis skills to derive reliable insights from large datasets.
Ability to move between research prototyping and production-quality code.
Ability to translate ambiguous problems into concrete, testable experiments.

Culture & Benefits

Hybrid policy: expected to be in one of the offices at least 25% of the time.
Visa sponsorship available; reasonable efforts made to support visas when an offer is made.
Generous vacation and parental leave, flexible working hours, and competitive compensation and benefits.
Optional equity donation matching and a collaborative office environment.

Hiring process

Recruiter outreach from @anthropic.com; avoid scams and verify openings via the official careers page.
Application process includes guidance on AI usage during the application.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →