Research Engineer (AI)

Формат работы

onsite

Тип работы

fulltime

Грейд

senior

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Research Engineer (AI): Building and optimizing high-scale distributed training infrastructure for large-scale AI models with an accent on GPU cluster performance, experiment orchestration, and data pipelines. Focus on diagnosing bottlenecks, implementing advanced parallelism strategies, and ensuring system reliability under heavy training loads.

Location: On-site in the San Francisco Bay Area.

Company

An applied AI lab focused on building end-to-end software agents like Devin and hirify.global.

What you will do

Build and own distributed systems for training large-scale models reliably across GPU clusters.
Profile and improve end-to-end training throughput by identifying bottlenecks in compute and communication.
Maintain and design experiment orchestration tools to maximize research velocity.
Develop high-throughput, reliable data pipelines for model training and evaluation.
Diagnose complex failures across GPUs, networking, and numerical stability.
Implement and optimize diverse parallelism strategies including data, tensor, and pipeline parallelism.

Requirements

Deep experience operating distributed training systems for large-scale models.
Strong systems engineering fundamentals covering networking, storage, and distributed compute.
Proficiency in Python and C++ with experience in PyTorch or JAX at a systems level.
Hands-on experience with GPU performance profiling and memory optimization.
Understanding of ML architectures to support research-specific infrastructure needs.
Advanced degree (PhD) in CS, ML, Physics, or Mathematics, or equivalent industry experience.

Culture & Benefits

Small, talent-dense team featuring world-class competitive programmers and researchers.
High-impact environment where infrastructure directly accelerates frontier AI research.
Access to massive GPU compute resources with minimal process overhead.
Focus on speed, autonomy, and technical depth in a competitive problem space.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →