Назад
3 мСсяца Π½Π°Π·Π°Π΄

Engineering Manager, Agent Prompts & Evals (AI)

1 - 2$
Π€ΠΎΡ€ΠΌΠ°Ρ‚ Ρ€Π°Π±ΠΎΡ‚Ρ‹
hybrid
Π’ΠΈΠΏ Ρ€Π°Π±ΠΎΡ‚Ρ‹
fulltime
Π“Ρ€Π΅ΠΉΠ΄
lead
Английский
b2
Π‘Ρ‚Ρ€Π°Π½Π°
US
Вакансия ΠΈΠ· списка Hirify.GlobalВакансия ΠΈΠ· Hirify Global, списка ΠΌΠ΅ΠΆΠ΄ΡƒΠ½Π°Ρ€ΠΎΠ΄Π½Ρ‹Ρ… tech-ΠΊΠΎΠΌΠΏΠ°Π½ΠΈΠΉ
Для мэтча ΠΈ ΠΎΡ‚ΠΊΠ»ΠΈΠΊΠ° Π½ΡƒΠΆΠ΅Π½ Plus

ΠœΡΡ‚Ρ‡ & Π‘ΠΎΠΏΡ€ΠΎΠ²ΠΎΠ΄

Для мэтча с этой вакансиСй Π½ΡƒΠΆΠ΅Π½ Plus

ОписаниС вакансии

ВСкст:
/

TL;DR

Engineering Manager, Agent Prompts & Evals (AI): Leading the team that owns the infrastructure for shipping model and prompt changes with confidence, including eval frameworks, system prompt pipelines, and regression-detection systems. Focus on measuring model behavior, building collaboration with other teams, and shaping the team's investment in frontier eval development and model launch automation.

Location: San Francisco, CA or New York City, NY. Expect all staff to be in one of our offices at least 25% of the time.

Salary: $1 - $2 USD

Company

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems that are safe and beneficial for users and society.

What you will do

  • Lead and grow a team of prompt engineers and platform software engineers.
  • Own the product-side eval platform and system prompt infrastructure, including versioning, deployment, rollback, and review tooling.
  • Be a steady hand through model launches, serving as the backstop when things get chaotic.
  • Build durable collaboration with other evals groups across the company, focusing on ownership boundaries and shared roadmaps.
  • Recruit, close, and retain engineers who want to work at the intersection of product engineering and model behavior.
  • Shape where the team invests next, considering paths into frontier eval development, model launch automation, and deeper prompt engineering support.

Requirements

  • 8+ years in software engineering with 3+ years managing engineering teams, including experience leading a platform, infra, or developer-tooling team.
  • A track record of building tooling and processes that make it easy for other teams to do the right thing.
  • Comfort managing a team with a mixed charter: platform ownership, service-to-other-teams, and a launch-driven operational rhythm.
  • Enough technical depth to engage on system design, review pipeline architecture, and be credible in debates with strong ICs.
  • A product mindset and willingness to wear multiple hats when the work calls for it.
  • Demonstrated ability to build and maintain peer relationships with partner orgs, negotiating ownership and aligning roadmaps.

Nice to have

  • Prior exposure to LLM evals, ML experimentation platforms, or model quality work.
  • Experience with A/B testing infrastructure, feature flagging, or gradual rollout systems.
  • Background in devtools, CI/CD platforms, or testing infrastructure at scale.
  • A history of managing teams that sit between two larger orgs and making that position an asset rather than a liability.
  • Interest in AI safety and alignment.

Culture & Benefits

  • Competitive compensation and benefits.
  • Optional equity donation matching.
  • Generous vacation and parental leave.
  • Flexible working hours.
  • Lovely office space in which to collaborate with colleagues.

Π‘ΡƒΠ΄ΡŒΡ‚Π΅ остороТны: Ссли Ρ€Π°Π±ΠΎΡ‚ΠΎΠ΄Π°Ρ‚Π΅Π»ΡŒ просит Π²ΠΎΠΉΡ‚ΠΈ Π² ΠΈΡ… систСму, ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΡ iCloud/Google, ΠΏΡ€ΠΈΡΠ»Π°Ρ‚ΡŒ ΠΊΠΎΠ΄/ΠΏΠ°Ρ€ΠΎΠ»ΡŒ, Π·Π°ΠΏΡƒΡΡ‚ΠΈΡ‚ΡŒ ΠΊΠΎΠ΄/ПО, Π½Π΅ Π΄Π΅Π»Π°ΠΉΡ‚Π΅ этого - это мошСнники. ΠžΠ±ΡΠ·Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎ ΠΆΠΌΠΈΡ‚Π΅ "ΠŸΠΎΠΆΠ°Π»ΠΎΠ²Π°Ρ‚ΡŒΡΡ" ΠΈΠ»ΠΈ ΠΏΠΈΡˆΠΈΡ‚Π΅ Π² ΠΏΠΎΠ΄Π΄Π΅Ρ€ΠΆΠΊΡƒ. ΠŸΠΎΠ΄Ρ€ΠΎΠ±Π½Π΅Π΅ Π² Π³Π°ΠΉΠ΄Π΅ β†’