Назад
Company hidden
16 часов назад

Research Scientist, Web Data (AI)

Формат работы
onsite
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
UK
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Research Scientist, Web Data (AI): Improving the web data pipeline, including scraping, data filtering, and adding new data sources for LLM training. Focus on identifying areas for improvement, developing measurements of weakness, and working with partner teams to leverage existing solutions.

Location: London, UK

Company

hirify.global is a team of scientists, engineers, and machine learning experts advancing the state of the art in artificial intelligence for widespread public benefit and scientific discovery.

What you will do

  • Investigate current results to identify areas for improvement based on user feedback or weak evaluation performance.
  • Develop measurements of weakness, either as model evaluation or data pipeline statistics, to help drive progress.
  • Set out a medium-term agenda to improve the data pipeline with feedback from peers and key stakeholders.
  • Work with partner teams in GDM (and wider Google) to leverage existing solutions effectively and communicate necessary infrastructure improvements.
  • Execute day-to-day tasks by coding, running experiments, and reviewing contributions.

Requirements

  • 3 years of experience working as a self-directed engineer or researcher.
  • Experience developing large-scale data (>=100M examples) processing pipelines in Python and/or C++.
  • Experience evaluating and investigating (pretrained) LLM performance.

Nice to have

  • Experience filtering data based on heuristic and/or learned signals.
  • Experience working with web data for LLM training, such as cleaning data, removing duplicates, and identifying valuable examples.
  • Experience developing advanced LLM metrics (e.g., execution-based, using auto-raters, etc.).

Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →