TL;DR
Sr. Research Engineer, Machine Learning, AGI Foundations (AI): Leading the development of industry-leading multimodal large-language foundational models (LLMs) with an accent on novel algorithms and modeling techniques. Focus on scaling pre and post-training workflows and building efficient models.
Location: USA, CA, Sunnyvale; USA, MA, Boston; USA, NY, New York; USA, WA, Bellevue
Salary: 193,300.00 - 261,500.00 USD annually (Sunnyvale); 168,100.00 - 227,400.00 USD annually (Boston); 184,900.00 - 250,200.00 USD annually (New York); 168,100.00 - 227,400.00 USD annually (Bellevue)
Company
hirify.global is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.
What you will do
- Responsible for pre and post-training multimodal LLMs.
- Scale training of models on hyper large GPU and AWS Trainium clusters
- Optimize training workflows using distributed training/parallelism techniques
- Optimize low-level details of the training stack, including CUDA kernels, communication collectives, network I/O.
- Utilize, build and extend upon industry leading frameworks (NeMo, Megatron Core, PyTorch, Jax, vLLM, TRT, etc)
- Work with other team members to investigate design approaches, prototype new technology, scientific techniques and evaluate technical feasibility
Requirements
- 5+ years of non-internship professional software development experience
- 5+ years of programming with at least one software programming language experience
- 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience
- Experience as a mentor, tech lead or leading an engineering team
Nice to have
- 5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
- Master's degree in machine learning or equivalent
- Hands-on experience and expertise in training Foundational Models/LLMs, and/or low-level optimization of ML training workflows, CUDA kernels, network I/O.
Culture & Benefits
- Comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
- 401(k) matching
- Paid time off, and parental leave
Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →