Назад
Company hidden
22 часа назад

AI Researcher (Multilingual Data)

Формат работы
remote (Global)
Тип работы
fulltime
Грейд
senior
Английский
b2
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

AI Researcher (NLP/ML): Building and scaling next-generation language models across diverse languages and domains with an accent on data sourcing, curation, and training strategies for multilingual and low-resource languages. Focus on publishing high-quality research at top venues and translating these insights into production-ready systems.

Location: Remote (World)

Company

hirify.global is a fast-moving startup building frontier language models with a focus on high-quality research and real-world impact.

What you will do

  • Design and execute research on multilingual datasets, including collection, filtering, deduplication, and quality measurement.
  • Develop augmentation, sampling, and curriculum design strategies for low-resource and long-tail languages.
  • Improve cross-lingual transfer, alignment, and robustness in large language models.
  • Build and maintain evaluation benchmarks for multilingual performance.
  • Collaborate with engineers on model architecture decisions and training pipelines.
  • Publish research at top venues such as ACL, EMNLP, NeurIPS, ICML, and ICLR.

Requirements

  • Strong background in NLP/ML research with a focus on multilingual or cross-lingual modeling.
  • Proven publication record at respected conferences or journals (e.g., NeurIPS, ICML, ICLR).
  • Experience working with large-scale text datasets across multiple languages.
  • Deep understanding of tokenization, vocabulary design, and data quality metrics.
  • Proficiency in Python with modern ML frameworks such as PyTorch or JAX.
  • Ability to operate independently and ship results at a startup pace.

Nice to have

  • Experience with non-Latin scripts or low-resource languages.
  • Contributions to open-source NLP or data tooling.
  • Direct experience in training or evaluating large language models.
  • Familiarity with multilingual benchmarks like XTREME, FLORES, or TyDi QA.

Culture & Benefits

  • Direct ownership over research direction and product impact.
  • Culture that equally values academic publishing and production deployment.
  • Access to large-scale datasets and modern infrastructure for fast iteration.
  • Competitive compensation and early-stage equity.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →