Member of Technical Staff - Reinforcement Learning (Infrastructure), AGI Autonomy

255 000 - 345 000$

Формат работы

onsite

Тип работы

fulltime

Грейд

senior

Английский

Страна

Описание вакансии

Текст:

TL;DR

Member of Technical Staff - Reinforcement Learning (Infrastructure), AGI Autonomy): Designing, building, and maintaining systems for training and evaluating state-of-the-art agent models with an accent on developing training infrastructure for large-scale reinforcement learning on LLMs. Focus on ensuring highly efficient and robust systems, working across the entire technology stack, and conducting MLSys research to create new techniques and tooling.

Location: Must be based in the USA, specifically Los Angeles County or San Francisco.

Salary: $255,000–$345,000/year

Company

The Amazon AGI SF Lab is focused on developing new foundational capabilities for enabling useful AI agents, combining the agility of a startup with the resources of Amazon.

What you will do

Design, build, and maintain systems for training and evaluating state-of-the-art agent models.
Develop training infrastructure to ensure large-scale reinforcement learning on LLMs runs highly efficient and robust.
Work across the entire technology stack, including low-level ML systems, job orchestration, and data management.
Analyze, troubleshoot, and profile complex ML systems, identifying and addressing performance bottlenecks.
Work closely with researchers to conduct MLSys research and create new techniques, infrastructure, and tooling.

Requirements

PhD, or Master's degree and 3+ years of applied research experience.
Experience with programming languages such as Python, Java, C++.
Experience with neural deep learning methods and machine learning.
Experience with training and deploying machine learning systems or troubleshooting and debugging technical systems.
Work authorization for the USA is required.

Nice to have

Experience with various machine learning techniques and parameters that affect their performance.
Experience with large-scale machine learning systems such as profiling and debugging, and understanding of system performance and scalability.
Experience with distributed systems, Megatron, vLLM, Ray, and working with GPUs.
Experience with patents or publications at top-tier peer-reviewed conferences or journals.

Culture & Benefits

Work within the Amazon AGI SF Lab, designed to empower AI researchers and engineers with speed and focus.
Benefit from a philosophy that combines the agility of a startup with the resources of Amazon.
Work in a lean team environment that maximizes compute resources per person.
Teams have autonomy to move fast and a long-term commitment to pursue high-risk, high-payoff research.
Receive a total compensation package including equity, sign-on payments, and a full range of medical, financial, and other benefits.