Senior Software Engineer (ML Network Stack)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior Software Engineer (ML Network Stack): Developing the network stack for EC2 distributed AI/ML systems with an accent on NCCL, NVSHMEM, and high-speed networking interconnects. Focus on building infrastructure for massive testing workloads, automating software delivery via CI/CD, and optimizing performance for the largest AI models.
Location: Tel Aviv, Israel
Company
Annapurna Labs, part of AWS, specializes in designing hardware and software components critical for EC2 infrastructure optimization.
What you will do
- Develop and maintain the network stack for EC2 distributed AI/ML systems, supporting frameworks like NCCL, NVSHMEM, and NIXL.
- Build and maintain infrastructure to monitor and report on the functionality and performance of massive testing workloads at scale.
- Automate software delivery using internal CI/CD tools, Linux, and AWS products.
- Write Python code to orchestrate large clusters and run benchmarks for ML and HPC workloads.
- Utilize AWS Managed Grafana and Athena to analyze performance data and create dashboards for stakeholders.
- Invent automatic mechanisms to detect functional and performance regressions before they reach customers.
Requirements
- 5+ years of professional software development experience.
- 5+ years of experience leading design or architecture for scalable and reliable systems.
- 5+ years of experience in the full software development life cycle, including coding standards, reviews, and build processes.
- 3+ years of experience as a mentor, tech lead, or leading engineering teams.
- 3+ years of experience in SW/HW Co-Design.
- Location: Must be based in Tel Aviv, Israel
Nice to have
- Bachelor's degree in computer science or equivalent.
- Experience creating automated dashboards and visualization tools such as Grafana.
Culture & Benefits
- Strong commitment to work-life harmony and flexibility.
- Extensive knowledge-sharing, mentorship, and career-advancing resources.
- Inclusive culture that values diverse experiences and non-traditional career paths.
- Opportunity to work on the forefront of AI/ML with the largest clusters and models.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →