Назад
Company hidden
7 месяцев назад

Member Of Technical Staff (Pre-Training Data)

Формат работы
remote (Global)
Тип работы
fulltime
Английский
b2
Страна
France/UK/US +1 еще
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Member Of Technical Staff (Pre-Training Data) (AI Engineering): Develop and optimize scalable data pipelines and data modeling techniques for diverse datasets to support advanced language model training. With an accent on data quality, diversity, and pipeline robustness. Focus on designing scalable ingestion, cleaning, filtering, and experimentation with data mixtures to improve model performance and training efficiency.

Location: Remote with offices in Toronto, San Francisco, New York, London, Paris, Montreal

Company

hirify.global is a product company focused on training and deploying frontier AI language models to power innovative applications in natural language processing.

What you will do

  • Design and build scalable data pipelines for ingestion, cleaning, filtering, and optimization of diverse datasets including web, code, multilingual, and synthetic data.
  • Conduct data ablations and experiment with data mixtures to assess and enhance data quality and model performance.
  • Develop robust data modeling techniques to structure datasets for optimal training efficiency.
  • Research and implement innovative data curation methods leveraging company infrastructure.
  • Collaborate with researchers and engineers to meet the demands of cutting-edge language models.

Requirements

  • Location: Remote-friendly with no restrictions on candidate location
  • Strong software engineering skills with proficiency in Python and data pipeline development.
  • Experience with data processing frameworks such as Apache Spark, Apache Beam, or Pandas.
  • Experience working with large-scale datasets including web data, code data, and multilingual corpora.
  • Knowledge of data quality assessment and experimentation with data mixtures.
  • Passion for bridging research and engineering in AI model training.

Nice to have

  • Publications at top-tier AI and ML conferences (NeurIPS, ICML, ICLR, etc.).

Culture & Benefits

  • Open and inclusive culture with a focus on diversity.
  • Work closely with a leading AI research team.
  • Weekly lunch stipend, in-office lunches, and snacks.
  • Full health and dental benefits including mental health budget.
  • 100% parental leave top-up for up to 6 months.
  • Personal enrichment benefits for arts, fitness, and workspace improvement.
  • Remote-flexible with offices in multiple major cities and co-working stipend.
  • 6 weeks of vacation (30 working days).

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →