Eval Engineer (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Eval Engineer (AI): Build evaluation systems that measure 's URL-to-LLM-ready data conversion across millions of websites and edge cases with an accent on metrics, pipelines, datasets, and LLM-as-judge. Focus on designing realistic benchmarks, closing feedback loops to models and RL, and integrating evals into CI/CD for production reliability.
Location: San Francisco, CA (Hybrid) OR Remote (Americas, UTC-3 to UTC-10)
Salary: $160,000–$240,000/year (U.S.-based in San Francisco, CA; adjusted fairly based on your country's cost of living) • Equity: 0.01%–0.10%
Company
provides an API to convert any URL into clean, structured, LLM-ready markdown or data. Fast-growing startup with millions in ARR and 50k+ GitHub stars, spun out from building web data infrastructure at Mendable.
What you will do
- Build and own the full eval stack: define metrics, pipelines, datasets for scrape, crawl, extract, and map quality across diverse web formats.
- Design benchmarks reflecting real customer data distribution, including SPAs, paywalls, dynamic content, and edge cases; create collection and labeling systems.
- Develop LLM-as-judge pipelines, validate against human judgment, build human review tools, and handle failure modes.
- Close the loop: turn eval signals into RL rewards and model training feedback; integrate into CI/CD to catch regressions.
- Run rapid experiments testing hypotheses, interpret results, and communicate clearly to influence product and model decisions.
Requirements
- 3+ years in ML engineering, applied AI, or data quality with production systems
- Build own eval infrastructure: pipelines, datasets, rubrics, judges; experience running evals at scale.
- Deep knowledge of LLM evaluation methodology, LLM-as-judge correlation with humans, rubrics, inter-rater agreement.
- Strong grasp of "good" unstructured web data quality (markdown, structured extraction schemas).
- US Citizenship/Visa required for SF hybrid; N/A for remote
- Production-minded: balance depth, coverage, cost; fast iteration with clear communication.
Nice to have
- Previous experience at scraping, automation, or security-focused startup
- Ex-founder
Culture & Benefits
- Remote-first culture with optional new SF office; collaborate with distributed team.
- High autonomy and ownership; small team with direct founder access and real impact.
- Unlimited PTO (minimum 3 weeks encouraged); 12 weeks paid parental leave; sabbatical after 4 years.
- Full medical, dental, vision coverage (100% employee, 50% family); 401(k); life/disability insurance.
- Wellness stipend ($100/month); learning budget ($150/year); team offsites; pet insurance; pre-tax benefits.
Hiring process
- Application review and automated assessment (~30 min).
- Intro chat (~25 min); technical interview (~1 hr challenge).
- Interview with founders (~30 min); paid work trial (1–2 weeks on real tasks).
- Fast decision.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →