TL;DR
Member of Technical Staff - Reinforcement Learning (Infrastructure), AGI Autonomy): Designing, building, and maintaining systems for training and evaluating state-of-the-art agent models with an accent on developing training infrastructure for large-scale reinforcement learning on LLMs. Focus on ensuring highly efficient and robust systems, working across the entire technology stack, and conducting MLSys research to create new techniques and tooling.
Location: Must be based in the USA, specifically Los Angeles County or San Francisco.
Salary: $255,000–$345,000/year
Company
The Amazon AGI SF Lab is focused on developing new foundational capabilities for enabling useful AI agents, combining the agility of a startup with the resources of Amazon.
What you will do
- Design, build, and maintain systems for training and evaluating state-of-the-art agent models.
- Develop training infrastructure to ensure large-scale reinforcement learning on LLMs runs highly efficient and robust.
- Work across the entire technology stack, including low-level ML systems, job orchestration, and data management.
- Analyze, troubleshoot, and profile complex ML systems, identifying and addressing performance bottlenecks.
- Work closely with researchers to conduct MLSys research and create new techniques, infrastructure, and tooling.
Requirements
- PhD, or Master's degree and 3+ years of applied research experience.
- Experience with programming languages such as Python, Java, C++.
- Experience with neural deep learning methods and machine learning.
- Experience with training and deploying machine learning systems or troubleshooting and debugging technical systems.
- Work authorization for the USA is required.
Nice to have
- Experience with various machine learning techniques and parameters that affect their performance.
- Experience with large-scale machine learning systems such as profiling and debugging, and understanding of system performance and scalability.
- Experience with distributed systems, Megatron, vLLM, Ray, and working with GPUs.
- Experience with patents or publications at top-tier peer-reviewed conferences or journals.
Culture & Benefits
- Work within the Amazon AGI SF Lab, designed to empower AI researchers and engineers with speed and focus.
- Benefit from a philosophy that combines the agility of a startup with the resources of Amazon.
- Work in a lean team environment that maximizes compute resources per person.
- Teams have autonomy to move fast and a long-term commitment to pursue high-risk, high-payoff research.
- Receive a total compensation package including equity, sign-on payments, and a full range of medical, financial, and other benefits.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →