2 месяца назад

Member of Engineering (Pre-training / Data Engineering, AI)

Формат работы

remote (только Europe/United_states)

Тип работы

fulltime

Грейд

senior

Английский

Страна

UK/US

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Member of Engineering (Pre-training / Data Engineering): Architecting and maintaining high-performance pipelines that process trillions of raw tokens into high-quality datasets for foundation models with an accent on ingestion, deduplication, streaming systems, and petabyte-scale data handling. Focus on algorithmic sorting, distributed pipeline optimization, and bridging raw web crawls to GPU clusters to directly influence model performance.

Location: Remote (EMEA/East Coast); London, UK; Remote (EMEA)

Company

Poolside is an AI company building agentic systems and coding assistants powered by frontier models to accelerate software development towards AGI for security-conscious enterprises.

What you will do

Build and maintain high-performance pipelines for processing trillions of tokens into diverse, high-quality datasets for pre-training foundation models and coding agents.
Engineer ingestion, deduplication, and streaming systems handling petabyte-scale data from raw web crawls to GPU clusters.
Optimize data modeling, algorithmic sorting, and distributed pipelines to enhance model performance.
Collaborate closely with Pretraining, Postraining, Evals, and Product teams to align datasets with model capabilities and use cases.

Requirements

Strong background in production-grade, distributed data systems for machine learning.
Experience with orchestration tools like Slurm, Airflow, or Dagster.
Observability & reliability with CI/CD, Grafana, Prometheus.
Infra skills: Git, Docker, k8s, cloud managed services, batch inference (e.g., vLLM).
Expert-level Python, strong algorithmic foundations, proficiency with Polars, Dask, or PySpark.
Performance obsession with large-scale GPU clusters and distributed pipelines.

Nice to have

Experience building trillion-scale SOTA pretraining datasets.
Translating research to production at scale.
Experience with OCR, web crawling, or evals.
Prior experience pre-training LLMs.

Culture & Benefits

Fully remote work with flexible hours.
37 days/year of vacation & holidays.
Health insurance allowance for you & dependents.
Company-provided equipment, well-being, always-be-learning & home office allowances.
Frequent team get-togethers including monthly 3-day collaboration in Paris (Mon-Wed, open invitation to stay longer) and annual off-sites.
Diverse & inclusive people-first culture with low ego, kind-hearted team focused on collaboration and mission.

Hiring process

Intro call with a Founding Engineer.
Technical interview(s) with a Founding Engineer.
Team fit call with the People team.
Final interview with a Founding Engineer.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Member of Engineering (Pre-training / Data Engineering, AI)

Poolside

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Nice to have

Culture & Benefits

Hiring process

Похожие вакансии

Senior Data Engineer (AI)

Senior Data Scientist

Staff/Lead Research Engineer (Data)

Principal ML Engineer (Agentic AI)

Senior ML Engineer (iGaming)

Senior Data Engineer (Retail)

Разработка

Game Dev

Design и Creative

Аналитика

Менеджмент

People & Business

Member of Engineering (Pre-training / Data Engineering, AI)

Poolside

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Nice to have

Culture & Benefits

Hiring process

Categories

Похожие вакансии

Senior Data Engineer (AI)

Senior Data Scientist

Staff/Lead Research Engineer (Data)

Principal ML Engineer (Agentic AI)

Senior ML Engineer (iGaming)

Senior Data Engineer (Retail)