QA Lead (AI Brand Evaluation)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
QA Lead (AI Brand Evaluation): Own end-to-end QA and evaluation strategy for agentic and AI-powered product features, with an accent on transforming ambiguous business requirements into robust evaluation criteria and scoring systems. Focus on building validation frameworks for non-deterministic AI outputs, curating golden test sets, and running adversarial/ambiguity testing at high volume.
Location: Remote within Colombia
Company
is a design and technology company building intelligent, shoppable brand experiences using its proprietary platform.
What you will do
- Lead end-to-end QA and evaluation strategy for multi-agent architectures and high-volume data pipelines.
- Translate business objectives into data-driven evaluation criteria, operational rubrics, and scoring methodologies.
- Oversee brand scoring architecture and ensure automated systems produce precise, reliable business metrics from massive datasets.
- Build tools, frameworks, and validation pipelines for non-deterministic AI outputs, including golden test set curation.
- Establish governance and risk-adaptive guardrails for data precision, PII compliance, and logical reasoning across agentic workflows.
- Drive adversarial/red-teaming and ambiguity testing; monitor observability dashboards for latency, data drift, and accuracy trends.
Requirements
- Location: Must be based in Colombia (remote)
- 8+ years of QA engineering, systems analysis, or software testing experience, including 2+ years in a lead/strategic role.
- Proven ability to lead QA initiatives in fast-paced environments with ambiguous requirements and establish order.
- Strong data/statistical mindset: move beyond pass/fail toward trend, composite scoring, and statistical evaluation.
- Hands-on scripting and data querying experience with Python, JavaScript, or SQL to build automated evaluation pipelines.
- Operational fluency with LLM behaviors and agentic workflows (e.g., hallucinations, context limits, instruction adherence).
Nice to have
- Experience with LLM evaluation frameworks (e.g., Ragas, LangSmith, TruLens) or prompt engineering.
- Familiarity with cloud data warehouses and analytics platforms (e.g., BigQuery, Databricks, Google Cloud).
- Experience building internal QA automation tools and lightweight validation scripts.
Culture & Benefits
- Remote role with work based in Colombia.
- Inclusive, equal-opportunity hiring and barrier-free recruitment process.
- Focus on building intelligent, brand-focused experiences using emerging technologies.
Hiring process
- Interviews to assess QA leadership, evaluation strategy, and experience with AI/LLM evaluation.
- Discussion of approach to building scoring systems, golden datasets, and adversarial testing frameworks.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →