Назад
2 дня назад

Software Engineer (Safeguards Evals)

320 000 - 485 000$
Формат работы
hybrid
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Software Engineer (Safeguards Evals) (AI): Build evaluation infrastructure for an agentic investigation system that detects misuse of Claude, with an accent on long-horizon agent metrics, high-quality eval datasets from real traffic, and production-grade regression/release pipelines. Focus on measuring end-to-end detection and investigation quality, identifying coverage gaps, and constructing RL environments to improve safety investigation capabilities.

Location: San Francisco, CA | New York City, NY

Salary: $320,000 - $485,000 USD (annual)

Company

Anthropic builds reliable, interpretable, and steerable AI systems with a focus on safety and beneficial outcomes.

What you will do

  • Build and own the evaluation harness for an agentic investigation system, defining metrics, test cases, and grading approaches for long-horizon agents.
  • Construct high-quality eval datasets representing real-world misuse across harm areas using real traffic patterns and synthetic generation.
  • Measure agent performance end-to-end (precision/recall, investigation quality, robustness) and drive improvements on the hardest harm areas.
  • Analyze coverage to find measurement gaps and evolve evals to stay unsaturated and high-signal as capabilities advance.
  • Productionize research into regression and release pipelines that run on every agent change, prompt update, and underlying model upgrade.
  • Build tooling that lets policy experts author, run, and iterate on evaluations without engineering support; construct RL environments to improve safety investigation capabilities.

Requirements

  • Proficiency in Python and comfort working across the stack.
  • Experience building and maintaining data pipelines.
  • Experience working with LLMs and understanding capabilities and failure modes, especially agentic systems with tool use and multi-step reasoning.
  • Strong data analysis skills to derive reliable insights from large datasets.
  • Ability to move between research prototyping and production-quality code.
  • Ability to translate ambiguous problems into concrete, testable experiments.

Culture & Benefits

  • Hybrid policy: expected to be in one of the offices at least 25% of the time.
  • Visa sponsorship available; reasonable efforts made to support visas when an offer is made.
  • Generous vacation and parental leave, flexible working hours, and competitive compensation and benefits.
  • Optional equity donation matching and a collaborative office environment.

Hiring process

  • Recruiter outreach from @anthropic.com; avoid scams and verify openings via the official careers page.
  • Application process includes guidance on AI usage during the application.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →