TL;DR
Director, Agentforce Testing Center Engineering (AI): Leading the engineering of a scalable evaluation platform for non-deterministic AI agents, with an accent on defining metrics, curating golden datasets, and establishing ground truth. Focus on operationalizing theoretical benchmarks into production regression tests and solving complex challenges in measuring AI agent behavior at scale.
Location: Must be based in the US (San Francisco, Palo Alto, or Bellevue)
Company
hirify.global is the Customer Company, inspiring the future of business with AI+ Data +CRM, empowering Trailblazers to drive performance, career growth, and improve the state of the world.
What you will do
- Lead the engineering of a scalable evaluation platform that runs in parallel with agent execution.
- Operationalize applied science by turning theoretical benchmarks into production regression tests and establish eval-driven development.
- Act as the internal Subject Matter Expert for AI testing, educating cross-functional partners on stochastic AI behavior.
- Provide technical leadership, process management, and maintain high-quality code delivery using AI tools.
- Lead and mentor teams, ensuring clear priorities, adequate resources, technical guidance, and career development.
Requirements
- Specialized Agent Evaluation Experience: Building evaluation harnesses for LLMs or Agents.
- Track record of managing "Research Engineering" or "Applied Science" teams to operationalize scientific goals into shipping code.
- Deep Knowledge of Eval Methodologies: Including LLM-as-a-Judge (validating judges against human ground truth) and Behavioral Analysis.
- Production-Grade AI Experience: Shipping AI products, managing constraints like token budgets, inference latency, and cost-normalized accuracy.
- Experience building simulation environments (mock APIs, virtual users) to stress-test agents.
- Experience with data engineering, including data acquisition, pipeline creation, metric measurement, and analysis.
- Advanced degree in Computer Science, Machine Learning, or related field with a focus on system evaluation or reliability.
Nice to have
- Familiarity with academic and industry benchmarks and their limitations in a business environment.
- Prior experience working with global teams.
Culture & Benefits
- Inspiring the future of business with AI+ Data +CRM.
- Empowerment for individual performance and career growth.
- Focus on using business as a platform for change and doing good.
- Opportunity to be a Trailblazer in the industry.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →