AI Researcher (Multilingual Data)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
AI Researcher (NLP/ML): Building and scaling next-generation language models across diverse languages and domains with an accent on data sourcing, curation, and training strategies for multilingual and low-resource languages. Focus on publishing high-quality research at top venues and translating these insights into production-ready systems.
Location: Remote (World)
Company
is a fast-moving startup building frontier language models with a focus on high-quality research and real-world impact.
What you will do
- Design and execute research on multilingual datasets, including collection, filtering, deduplication, and quality measurement.
- Develop augmentation, sampling, and curriculum design strategies for low-resource and long-tail languages.
- Improve cross-lingual transfer, alignment, and robustness in large language models.
- Build and maintain evaluation benchmarks for multilingual performance.
- Collaborate with engineers on model architecture decisions and training pipelines.
- Publish research at top venues such as ACL, EMNLP, NeurIPS, ICML, and ICLR.
Requirements
- Strong background in NLP/ML research with a focus on multilingual or cross-lingual modeling.
- Proven publication record at respected conferences or journals (e.g., NeurIPS, ICML, ICLR).
- Experience working with large-scale text datasets across multiple languages.
- Deep understanding of tokenization, vocabulary design, and data quality metrics.
- Proficiency in Python with modern ML frameworks such as PyTorch or JAX.
- Ability to operate independently and ship results at a startup pace.
Nice to have
- Experience with non-Latin scripts or low-resource languages.
- Contributions to open-source NLP or data tooling.
- Direct experience in training or evaluating large language models.
- Familiarity with multilingual benchmarks like XTREME, FLORES, or TyDi QA.
Culture & Benefits
- Direct ownership over research direction and product impact.
- Culture that equally values academic publishing and production deployment.
- Access to large-scale datasets and modern infrastructure for fast iteration.
- Competitive compensation and early-stage equity.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →