Назад
Company hidden
2 месяца назад

AI Researcher (Multilingual Data)

Формат работы
remote (Global)
Тип работы
fulltime
Грейд
senior
Английский
b2
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

AI Researcher (NLP/ML): Building and scaling next-generation language models across diverse languages and domains with an accent on data sourcing, curation, and training strategies for multilingual and low-resource languages. Focus on publishing high-quality research at top venues and translating these insights into production-ready systems.

Location: Remote (World)

Company

hirify.global is a fast-moving startup building frontier language models with a focus on high-quality research and real-world impact.

What you will do

  • Design and execute research on multilingual datasets, including collection, filtering, deduplication, and quality measurement.
  • Develop augmentation, sampling, and curriculum design strategies for low-resource and long-tail languages.
  • Improve cross-lingual transfer, alignment, and robustness in large language models.
  • Build and maintain evaluation benchmarks for multilingual performance.
  • Collaborate with engineers on model architecture decisions and training pipelines.
  • Publish research at top venues such as ACL, EMNLP, NeurIPS, ICML, and ICLR.

Requirements

  • Strong background in NLP/ML research with a focus on multilingual or cross-lingual modeling.
  • Proven publication record at respected conferences or journals (e.g., NeurIPS, ICML, ICLR).
  • Experience working with large-scale text datasets across multiple languages.
  • Deep understanding of tokenization, vocabulary design, and data quality metrics.
  • Proficiency in Python with modern ML frameworks such as PyTorch or JAX.
  • Ability to operate independently and ship results at a startup pace.

Nice to have

  • Experience with non-Latin scripts or low-resource languages.
  • Contributions to open-source NLP or data tooling.
  • Direct experience in training or evaluating large language models.
  • Familiarity with multilingual benchmarks like XTREME, FLORES, or TyDi QA.

Culture & Benefits

  • Direct ownership over research direction and product impact.
  • Culture that equally values academic publishing and production deployment.
  • Access to large-scale datasets and modern infrastructure for fast iteration.
  • Competitive compensation and early-stage equity.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →