Senior Member Of Technical Staff (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior Member of Technical Staff (AI): Developing large-scale web data pipelines for pre-training frontier language models with an accent on data extraction, parsing, deduplication, and filtering. Focus on analyzing web data composition, improving data quality for model performance, and collaborating with research teams to iterate on training corpora.
Location: Remote-friendly (Global)
Company
is a leading AI company dedicated to building and deploying frontier language models for developers and enterprises.
What you will do
- Maintain and scale large-scale pipelines for processing web corpora.
- Develop filtering and quality-scoring systems to identify high-value web documents.
- Analyze web data composition across various domains, languages, and time periods.
- Build and maintain highly-performant deduplication pipelines.
- Collaborate with cross-functional research and engineering teams to optimize data for cutting-edge models.
Requirements
- Strong software engineering skills with proficiency in Python.
- Experience building and maintaining data pipelines.
- Familiarity with data processing frameworks such as Apache Spark, Apache Beam, or Pandas.
- Experience working with large-scale web datasets.
- Knowledge of data quality assessment techniques and experimentation with data mixtures.
- Passion for bridging research and engineering to solve complex data challenges in AI.
Nice to have
- Publications at top-tier venues such as NeurIPS, ICML, ICLR, ACL, or EMNLP.
Culture & Benefits
- Open and inclusive work environment.
- Opportunity to work on the cutting edge of AI research.
- Comprehensive health and dental benefits, including mental health support.
- 100% parental leave top-up for up to 6 months.
- 6 weeks of vacation per year.
- Personal enrichment benefits for fitness, well-being, and workspace improvement.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →