Member of Engineering (Pre-training / Data Engineering, AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Member of Engineering (Pre-training / Data Engineering): Architecting and maintaining high-performance pipelines that process trillions of raw tokens into high-quality datasets for foundation models with an accent on ingestion, deduplication, streaming systems, and petabyte-scale data handling. Focus on algorithmic sorting, distributed pipeline optimization, and bridging raw web crawls to GPU clusters to directly influence model performance.
Location: Remote (EMEA/East Coast); London, UK; Remote (EMEA)
Company
is an AI company building agentic systems and coding assistants powered by frontier models to accelerate software development towards AGI for security-conscious enterprises.
What you will do
- Build and maintain high-performance pipelines for processing trillions of tokens into diverse, high-quality datasets for pre-training foundation models and coding agents.
- Engineer ingestion, deduplication, and streaming systems handling petabyte-scale data from raw web crawls to GPU clusters.
- Optimize data modeling, algorithmic sorting, and distributed pipelines to enhance model performance.
- Collaborate closely with Pretraining, Postraining, Evals, and Product teams to align datasets with model capabilities and use cases.
Requirements
- Strong background in production-grade, distributed data systems for machine learning.
- Experience with orchestration tools like Slurm, Airflow, or Dagster.
- Observability & reliability with CI/CD, Grafana, Prometheus.
- Infra skills: Git, Docker, k8s, cloud managed services, batch inference (e.g., vLLM).
- Expert-level Python, strong algorithmic foundations, proficiency with Polars, Dask, or PySpark.
- Performance obsession with large-scale GPU clusters and distributed pipelines.
Nice to have
- Experience building trillion-scale SOTA pretraining datasets.
- Translating research to production at scale.
- Experience with OCR, web crawling, or evals.
- Prior experience pre-training LLMs.
Culture & Benefits
- Fully remote work with flexible hours.
- 37 days/year of vacation & holidays.
- Health insurance allowance for you & dependents.
- Company-provided equipment, well-being, always-be-learning & home office allowances.
- Frequent team get-togethers including monthly 3-day collaboration in Paris (Mon-Wed, open invitation to stay longer) and annual off-sites.
- Diverse & inclusive people-first culture with low ego, kind-hearted team focused on collaboration and mission.
Hiring process
- Intro call with a Founding Engineer.
- Technical interview(s) with a Founding Engineer.
- Team fit call with the People team.
- Final interview with a Founding Engineer.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →