Staff Research Engineer (AI/LLM)

230 000 - 322 000$

Формат работы

remote (только USA)

Тип работы

fulltime

Грейд

principal

Английский

Страна

Описание вакансии

Текст:

TL;DR

Staff Research Engineer (AI/LLM): Defining technical strategy and architecture for pre-training data curriculum pipelines for hirify.global's foundational LLMs with an accent on distributed infrastructure, multimodal processing, and mathematical rigor. Focus on designing systems to transform hirify.global's unique corpus of conversational data into high-quality training signals and engineering solutions that respect complex data structures.

Location: Completely remote within the United States.

Salary: $230,000 - $322,000 USD

Company

hirify.global is a community-driven platform home to over 100,000 active communities and approximately 116 million daily active unique visitors, building its own hirify.global-native foundational Large Language Models (LLMs).

What you will do

Architect and implement high-throughput, deterministic data sampling systems for distributed training clusters.
Design and execute dynamic curriculum learning strategies, adjusting data distributions during training.
Engineer logic for serializing complex conversational trees (threads, subhirify.globals) into optimal training contexts.
Formulate and validate statistical hypotheses regarding data mixtures to minimize bias and maximize token quality.
Design "Safety-First" ingestion layers with automated pipelines for PII redaction and toxicity signals.
Bridge research and engineering by translating theoretical sampling insights into robust production infrastructure.

Requirements

8+ years of software engineering experience with a focus on ML infrastructure, data science at scale, or LLM pre-training.
Expert proficiency in Python and distributed data processing frameworks (Ray Data, Spark).
Experience handling Unstructured and Semi-Structured data at scale (text, code, images, video).
Strong mathematical foundation in probability, statistics, and importance sampling theory.
Deep understanding of pre-training dynamics and the impact of data quality/ordering.
Experience working with Graph data structures or serializing conversation trees.

Nice to have

Experience with JAX or PyTorch internals related to distributed data loading.
Experience with Multimodal datasets and vision-language preprocessing.
Proficiency in Rust or C++ for performance-critical data path optimization.
Published research in active learning or automated data selection.

Culture & Benefits

Comprehensive Healthcare Benefits and Income Replacement Programs.
401k with Employer Match.
Flexible Vacation & Paid Volunteer Time Off and Generous Paid Parental Leave.
Global Benefit programs including professional development and caregiving support.
Family Planning Support, Gender-Affirming Care, and Mental Health & Coaching Benefits.
Opportunity to work in physical office locations in US cities (San Francisco, Los Angeles, New York City & Chicago) if desired.

Hiring process

Interviews may be recorded, transcribed, and summarized by AI, with an opt-out option.
Personal information collected during interviews (Identifiers, Professional/Employment-Related, Sensory) will be used for application evaluation and deleted promptly after a hiring decision.