обновлено 5 дней назад

Research Engineer - Data (AI)

350 000 - 400 000$

Тип работы

fulltime

Грейд

senior

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Research Engineer - Data (AI): Build and drive the data foundation for research efforts in materials, energy, and physical sciences with an accent on sourcing scientific datasets, integrating experimental data, and ensuring high-quality inputs for frontier models. Focus on designing scalable pipelines, data quality systems, and tooling for reproducibility and researcher collaboration.

Location: Lab in Menlo Park, prefer located in Menlo Park or San Francisco but flexible based on role.

Compensation: $350,000-400,000 annual base commensurate with experience.

Company

AI and physical sciences company building state-of-the-art models to accelerate breakthroughs across materials, energy, and beyond. Backed by world-class investors and growing rapidly.

What you will do

Own data strategy across the training stack, identifying gaps and shaping the roadmap with research leads.
Source, evaluate, and procure external datasets in chemistry, physics, materials science, and more.
Build pipelines for ingesting, processing, and versioning large-scale heterogeneous datasets.
Design data quality systems including deduplication, filtering, and normalization at scale.
Integrate lab experimental data, simulations, and model outputs into the training stack.
Develop tooling for data inspection, querying, metadata tracking, and reproducibility.
Collaborate on token budgets, data mixing, and curriculum design with ML engineers.

Requirements

Bachelor’s degree or equivalent.
Experience building large-scale data pipelines for LLM pretraining or midtraining.
Expertise in data quality techniques like MinHash, SimHash, perplexity filtering, and PII scrubbing.
Work with scientific data formats (papers, patents, simulations, lab exports) and normalization.
Distributed processing with Spark, Ray, or Dask at TB/PB scale.
Dataset versioning, lineage tracking (DVC, Delta Lake).
Strong Python for production tooling; collaborate with ML researchers.
Research mindset with experiments and iteration.

Nice to have

Curating scientific datasets for domain-adaptive pretraining or tuning.
Synthetic data generation and verification.
Background in physical science or engineering.
Multimodal data integration (text, numerical, molecular, spectral).

Culture & Benefits

Visa sponsorship: Yes, with legal support.
Operate at frontier pace with deep expertise, ownership, and drive.
Team of top scientists, engineers, and problem-solvers defining the frontier.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Похожие вакансии

Research Engineer - Data (AI)

Periodic Labs

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Nice to have

Culture & Benefits

Похожие вакансии

Forward Deployed Engineer (AI)

Software Engineer (GenAI)

AI Data Engineer (AI)

Staff Software Engineer (AI)

Staff Software Engineer (Code RL)

Staff Software Engineer, Foundation Model Inference (AI Engineering)

Разработка

Game Dev

Design и Creative

Аналитика

Менеджмент

People & Business

Research Engineer - Data (AI)

Periodic Labs

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Nice to have

Culture & Benefits

Categories

Похожие вакансии

Forward Deployed Engineer (AI)

Software Engineer (GenAI)

AI Data Engineer (AI)

Staff Software Engineer (AI)

Staff Software Engineer (Code RL)

Staff Software Engineer, Foundation Model Inference (AI Engineering)