Назад
Company hidden
17 часов назад

Data Acquisition Engineer (AI)

Формат работы
remote (только Europe/United_states)
Тип работы
fulltime
Английский
b2
Страна
UK
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Data Acquisition Engineer (AI): Designing and operating large-scale web crawlers and data pipelines for pre-training frontier LLMs with an accent on distributed systems and high-throughput ingestion. Focus on maximizing data recall from high-value sources, building observability tooling, and aligning sourcing with model training needs.

Location: Remote (EMEA or US East Coast). Includes monthly 3-day in-person collaboration in Paris (Monday-Wednesday).

Company

hirify.global is an AI research company building agentic systems and frontier models to accelerate software development and reach AGI.

What you will do

  • Design, build, and operate a large-scale web crawler responsible for acquiring openly accessible internet data.
  • Develop specialized deep crawlers targeting high-value sources to improve recall and coverage.
  • Own the long-term roadmap for data acquisition in collaboration with data researchers.
  • Build observability, monitoring, and debugging tooling to ensure infrastructure reliability.
  • Collaborate with pre-training, post-training, and evaluations teams to align data priorities.
  • Build high-throughput ingestion pipelines for rapidly onboarding and evaluating partner data.

Requirements

  • Must be based in EMEA or US East Coast.
  • Strong distributed systems background with proven experience building large-scale infrastructure (data pipelines, web crawlers).
  • Proficiency in Python, including performance optimization and debugging complex production systems.
  • Hands-on experience with web crawling, HTTP protocols, and distributed job queues.
  • Familiarity with AWS, Kubernetes, and Docker for managing high-throughput workloads.
  • Knowledge of data privacy, robots.txt adherence, and responsible crawling practices.

Nice to have

  • Prior experience pre-training LLMs.
  • Experience building trillion-scale SOTA pre-training datasets.
  • Experience translating research into production at scale.

Culture & Benefits

  • Fully remote work with flexible hours.
  • 37 days of vacation and holidays per year.
  • 16 weeks of flexible, full-pay parental leave.
  • Health insurance allowance for employees and dependents.
  • Company-provided equipment and allowances for home office and learning.
  • Frequent team gatherings and mandatory monthly off-sites in Paris.

Hiring process

  • Introductory call with a Founding Engineer.
  • Technical interview(s) with Engineering team members.
  • Team fit call with the People team.
  • Final interview with a Founding Engineer.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →