Назад
Company hidden
4 месяца назад

Research Scientist/Engineer (Evaluations)

Формат работы
onsite
Тип работы
fulltime
Английский
b2
Вакансия из списка Hirify.GlobalВакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Research Scientist/Engineer (Evaluations): Develop and run evaluations that help assess the risks posed by scheming AIs, working with frontier AI labs like OpenAI, Anthropic, and Google DeepMind with an accent on building efficient pipelines and automating them. Focus on rigorously testing frontier AI models, deep diving into AI cognition, and designing novel test environments for frontier risks.

Company

hirify.global develops and runs evaluations that help assess the risks posed by scheming AIs.

What you will do

  • Run pre-deployment evaluation campaigns on the most capable AI systems in the world, interacting with new models before anyone else.
  • Deep dive into AI cognition, scanning thousands of model transcripts to surface behavioral patterns.
  • Build new evaluations for frontier risks, from designing novel test environments to scaling them across hundreds of distinct scenarios.
  • Work directly with frontier AI developers, sharing findings and informing deployment decisions for capable AI systems.
  • Automate and improve the evaluation pipeline, rethinking and reshaping it as agent capabilities emerge.

Requirements

  • Strong software engineering skills, with experience shipping and maintaining production Python code.
  • A mindset for process optimisation, continuously improving workflows and shaving friction.
  • Ability to extract signal from large, messy datasets using quantitative analysis and qualitative assessment.
  • Excellent writing and communication skills to convey qualitative and quantitative findings to diverse audiences.
  • Curiosity and experience as an AI power-user, understanding the capabilities and propensities of frontier AI models.

Nice to have

  • Experience with Inspect as the primary evals framework.

Culture & Benefits

  • Opportunity to work with frontier labs like OpenAI, Anthropic, and Google DeepMind.
  • Be amongst the first to interact with new models before anyone else.
  • Strong encouragement to apply for candidates who don’t fulfill all characteristics but believe they are a good fit.
  • Welcomes self-taught candidates and does not require a formal background or industry experience.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →