TL;DR
Staff Product Manager (AI Evals): Owns the evaluation framework for AI agents at hirify.global, covering both internal framework development and customer-facing tools for agent assessment. Focus on translating practitioner knowledge of AI/ML evaluation into product features, driving adoption, and defining metrics for agent quality and customer evaluation engagement.
Location: Palo Alto, California or remote within the USA
Company
hirify.global is a leader in enterprise orchestration, providing an AI-powered platform to streamline operations by connecting data, processes, applications, and experiences for 400,000 global customers.
What you will do
- Define and own the evaluation framework for hirify.global's internal AI agent features, driving adoption across teams.
- Build the customer-facing evaluation experience for builders to test, measure, and improve agents created on hirify.global.
- Make critical decisions regarding evaluation complexity exposure, balancing rigor with approachability.
- Partner with the Build Experience PM to integrate evaluation seamlessly into the builder journey.
- Work with ML engineers and platform teams to ground the framework in technical reality while ensuring accessibility.
- Establish metrics for internal agent quality and customer evaluation adoption, and understand customer struggles with agent performance assessment.
Requirements
- 7+ years in Product Management with hands-on experience writing evaluations for AI/ML systems (agents, LLMs, or similar).
- Track record of shipping technical products to both internal and external users.
- Experience driving adoption of frameworks or practices across engineering teams.
- Strong written and verbal communication skills.
- Bachelor's degree or equivalent experience.
- Practitioner depth in evaluations, including building test suites, designing rubrics, and debugging agent underperformance.
- Strong product management experience, including shipping products, driving roadmaps, and leading cross-functional teams.
- Technical translation ability to make complex evaluation concepts accessible to business technologists without oversimplification.
- Internal influence skills to drive adoption of frameworks and tools across teams and collaborate credibly with ML engineers.
- Comfort defining products from ambiguity, scoping v1s, and iterating based on learnings.
- B2B product sensibility, viewing enterprise conventions as problems to solve.
Nice to have
- Experience with agent architectures, RAG systems, or LLM application development.
- Background in ML engineering, solutions architecture, or technical program management.
- Experience building developer tools or platform products.
- Familiarity with evaluation frameworks (e.g., human eval pipelines, automated benchmarks, red-teaming).
Culture & Benefits
- Flexible, trust-oriented culture that empowers full ownership of roles.
- Driven by innovation and seeking team players to actively build the company.
- Emphasis on balancing productivity with self-care.
- Vibrant and dynamic work environment.
- Multitude of benefits (detailed on careers page).
- Recognized as a top enterprise startup and a leader for remote workers.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →