Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Software Engineer (Safeguards Evals) (AI): Build evaluation infrastructure for an agentic investigation system that detects misuse of Claude, with an accent on long-horizon agent metrics, high-quality eval datasets from real traffic, and production-grade regression/release pipelines. Focus on measuring end-to-end detection and investigation quality, identifying coverage gaps, and constructing RL environments to improve safety investigation capabilities.
Location: San Francisco, CA | New York City, NY
Salary: $320,000 - $485,000 USD (annual)
Company
Anthropic builds reliable, interpretable, and steerable AI systems with a focus on safety and beneficial outcomes.
What you will do
- Build and own the evaluation harness for an agentic investigation system, defining metrics, test cases, and grading approaches for long-horizon agents.
- Construct high-quality eval datasets representing real-world misuse across harm areas using real traffic patterns and synthetic generation.
- Measure agent performance end-to-end (precision/recall, investigation quality, robustness) and drive improvements on the hardest harm areas.
- Analyze coverage to find measurement gaps and evolve evals to stay unsaturated and high-signal as capabilities advance.
- Productionize research into regression and release pipelines that run on every agent change, prompt update, and underlying model upgrade.
- Build tooling that lets policy experts author, run, and iterate on evaluations without engineering support; construct RL environments to improve safety investigation capabilities.
Requirements
- Proficiency in Python and comfort working across the stack.
- Experience building and maintaining data pipelines.
- Experience working with LLMs and understanding capabilities and failure modes, especially agentic systems with tool use and multi-step reasoning.
- Strong data analysis skills to derive reliable insights from large datasets.
- Ability to move between research prototyping and production-quality code.
- Ability to translate ambiguous problems into concrete, testable experiments.
Culture & Benefits
- Hybrid policy: expected to be in one of the offices at least 25% of the time.
- Visa sponsorship available; reasonable efforts made to support visas when an offer is made.
- Generous vacation and parental leave, flexible working hours, and competitive compensation and benefits.
- Optional equity donation matching and a collaborative office environment.
Hiring process
- Recruiter outreach from @anthropic.com; avoid scams and verify openings via the official careers page.
- Application process includes guidance on AI usage during the application.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →