Назад
Company hidden
15 часов назад

Staff Research Engineer (AI/LLM)

230 000 - 322 000$
Формат работы
remote (только USA)
Тип работы
fulltime
Грейд
principal
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Staff Research Engineer (AI/LLM): Defining technical strategy and architecture for pre-training data curriculum pipelines for hirify.global's foundational LLMs with an accent on distributed infrastructure, multimodal processing, and mathematical rigor. Focus on designing systems to transform hirify.global's unique corpus of conversational data into high-quality training signals and engineering solutions that respect complex data structures.

Location: Completely remote within the United States.

Salary: $230,000 - $322,000 USD

Company

hirify.global is a community-driven platform home to over 100,000 active communities and approximately 116 million daily active unique visitors, building its own hirify.global-native foundational Large Language Models (LLMs).

What you will do

  • Architect and implement high-throughput, deterministic data sampling systems for distributed training clusters.
  • Design and execute dynamic curriculum learning strategies, adjusting data distributions during training.
  • Engineer logic for serializing complex conversational trees (threads, subhirify.globals) into optimal training contexts.
  • Formulate and validate statistical hypotheses regarding data mixtures to minimize bias and maximize token quality.
  • Design "Safety-First" ingestion layers with automated pipelines for PII redaction and toxicity signals.
  • Bridge research and engineering by translating theoretical sampling insights into robust production infrastructure.

Requirements

  • 8+ years of software engineering experience with a focus on ML infrastructure, data science at scale, or LLM pre-training.
  • Expert proficiency in Python and distributed data processing frameworks (Ray Data, Spark).
  • Experience handling Unstructured and Semi-Structured data at scale (text, code, images, video).
  • Strong mathematical foundation in probability, statistics, and importance sampling theory.
  • Deep understanding of pre-training dynamics and the impact of data quality/ordering.
  • Experience working with Graph data structures or serializing conversation trees.

Nice to have

  • Experience with JAX or PyTorch internals related to distributed data loading.
  • Experience with Multimodal datasets and vision-language preprocessing.
  • Proficiency in Rust or C++ for performance-critical data path optimization.
  • Published research in active learning or automated data selection.

Culture & Benefits

  • Comprehensive Healthcare Benefits and Income Replacement Programs.
  • 401k with Employer Match.
  • Flexible Vacation & Paid Volunteer Time Off and Generous Paid Parental Leave.
  • Global Benefit programs including professional development and caregiving support.
  • Family Planning Support, Gender-Affirming Care, and Mental Health & Coaching Benefits.
  • Opportunity to work in physical office locations in US cities (San Francisco, Los Angeles, New York City & Chicago) if desired.

Hiring process

  • Interviews may be recorded, transcribed, and summarized by AI, with an opt-out option.
  • Personal information collected during interviews (Identifiers, Professional/Employment-Related, Sensory) will be used for application evaluation and deleted promptly after a hiring decision.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник - загрузка...