Назад
Company hidden
1 день назад

QA Lead (AI Brand Evaluation)

Формат работы
remote (только Colombia)
Тип работы
fulltime
Грейд
lead
Английский
b2
Страна
Colombia
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

QA Lead (AI Brand Evaluation): Own end-to-end QA and evaluation strategy for agentic and AI-powered product features, with an accent on transforming ambiguous business requirements into robust evaluation criteria and scoring systems. Focus on building validation frameworks for non-deterministic AI outputs, curating golden test sets, and running adversarial/ambiguity testing at high volume.

Location: Remote within Colombia

Company

hirify.global is a design and technology company building intelligent, shoppable brand experiences using its proprietary platform.

What you will do

  • Lead end-to-end QA and evaluation strategy for multi-agent architectures and high-volume data pipelines.
  • Translate business objectives into data-driven evaluation criteria, operational rubrics, and scoring methodologies.
  • Oversee brand scoring architecture and ensure automated systems produce precise, reliable business metrics from massive datasets.
  • Build tools, frameworks, and validation pipelines for non-deterministic AI outputs, including golden test set curation.
  • Establish governance and risk-adaptive guardrails for data precision, PII compliance, and logical reasoning across agentic workflows.
  • Drive adversarial/red-teaming and ambiguity testing; monitor observability dashboards for latency, data drift, and accuracy trends.

Requirements

  • Location: Must be based in Colombia (remote)
  • 8+ years of QA engineering, systems analysis, or software testing experience, including 2+ years in a lead/strategic role.
  • Proven ability to lead QA initiatives in fast-paced environments with ambiguous requirements and establish order.
  • Strong data/statistical mindset: move beyond pass/fail toward trend, composite scoring, and statistical evaluation.
  • Hands-on scripting and data querying experience with Python, JavaScript, or SQL to build automated evaluation pipelines.
  • Operational fluency with LLM behaviors and agentic workflows (e.g., hallucinations, context limits, instruction adherence).

Nice to have

  • Experience with LLM evaluation frameworks (e.g., Ragas, LangSmith, TruLens) or prompt engineering.
  • Familiarity with cloud data warehouses and analytics platforms (e.g., BigQuery, Databricks, Google Cloud).
  • Experience building internal QA automation tools and lightweight validation scripts.

Culture & Benefits

  • Remote role with work based in Colombia.
  • Inclusive, equal-opportunity hiring and barrier-free recruitment process.
  • Focus on building intelligent, brand-focused experiences using emerging technologies.

Hiring process

  • Interviews to assess QA leadership, evaluation strategy, and experience with AI/LLM evaluation.
  • Discussion of approach to building scoring systems, golden datasets, and adversarial testing frameworks.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →