Назад
Company hidden
8 часов назад

AI Quality Engineer (AI)

Формат работы
hybrid
Тип работы
fulltime
Грейд
middle
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

AI Quality Engineer (Python/LLM): Designing and implementing evaluation frameworks to assess the quality of LLM and agentic AI systems with an accent on accuracy, consistency, and safety. Focus on building automated test pipelines for agentic workflows and developing tooling to detect regressions in model behavior.

Location: Atlanta, GA, US. Must be eligible to work in the United States without sponsorship. Remote work flexibility available.

Company

hirify.global provides cloud-based software and services to help nonprofits and associations manage communities, fundraising, and operations.

What you will do

  • Design and implement evaluation frameworks (evals) to assess LLM and agentic AI system quality.
  • Build and maintain automated test pipelines covering unit, integration, and end-to-end scenarios.
  • Develop tooling to detect regressions in model behavior and prompt outputs across releases.
  • Define and track AI quality metrics such as hallucination rates and tool-use accuracy.
  • Collaborate with engineers and product managers to identify edge cases and adversarial failure modes.
  • Contribute to prompt evaluation strategies, including red-teaming and bias assessments.

Requirements

  • 3–5 years of professional software engineering or quality engineering experience.
  • Hands-on experience with LLMs or agentic AI systems (e.g., GPT-4, Claude, Gemini).
  • Proficiency in Python for scripting, test automation, and data analysis.
  • Experience designing and running evaluations for generative AI or LLM-powered features.
  • Must be eligible to work in the United States without sponsorship.
  • Strong analytical skills to interpret probabilistic outputs and distinguish regressions from variance.

Nice to have

  • Experience with systematic prompt evaluation methodologies and prompt engineering.
  • Familiarity with AI safety, alignment, and hallucination mitigation guardrails.
  • Exposure to agentic orchestration frameworks like LangChain, LangGraph, AutoGen, or CrewAI.
  • Experience with vector databases or RAG pipelines (e.g., Pinecone, Weaviate).
  • Knowledge of AI monitoring tools such as LangSmith, Weights & Biases, or Arize.

Culture & Benefits

  • Comprehensive Medical, Dental & Vision benefits.
  • 401(k) Savings Plan with company match.
  • Flexible planned paid time off and generous sick leave.
  • Employer-paid parental leave and short-term disability.
  • Inclusive, purpose-driven culture with a focus on work-life balance.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →