Эта вакансия в архиве

Посмотреть похожие вакансии ↓
Company hidden
обновлено 12 часов назад

Lead Engineer, ML Network Stack (AI Engineering)

168 100 - 261 500$
Формат работы
onsite
Тип работы
fulltime
Грейд
lead
Английский
b2
Страна
US

Описание вакансии

Текст:
/

TL;DR

Lead Engineer (AI): Developing and maintaining infrastructure for high-performance network stacks in EC2 distributed AI/ML systems with an accent on large-scale infrastructure automation and performance monitoring. Focus on designing reliable systems for massive testing workloads, optimizing communication libraries like NCCL, and mentoring the team to evolve into technical management.

Location: Must be based in the USA (Seattle, WA or Cupertino, CA)

Salary: $168,100–$261,500 USD annually

Company

Annapurna Labs is a core unit within AWS focused on designing critical hardware and software infrastructure for cloud computing.

What you will do

  • Lead a team building infrastructure for monitoring and reporting on massive performance testing workloads.
  • Develop and maintain support for communication libraries like NCCL, NVSHMEM, and NIXL.
  • Use Python to automate cluster provisioning and benchmark execution for AI/HPC workloads.
  • Create dashboards with Grafana and Athena to visualize performance regressions.
  • Invent automated mechanisms to proactively alert developers to functional regressions.
  • Provide mentorship and architectural guidance while transitioning into a technical management role.

Requirements

  • 5+ years of non-internship professional software development experience.
  • 5+ years leading system design and architecture with a focus on reliability and scaling.
  • 3+ years of experience as a mentor, tech lead, or engineering team leader.
  • 3+ years of experience in SW/HW Co-Design.
  • Strong expertise in Linux, networking, and performant coding.
  • Full software development lifecycle experience including CI/CD and operations.

Nice to have

  • Bachelor's degree in computer science or equivalent.
  • Experience with high-speed networking or HPC/RDMA interconnects.
  • Experience with embedded systems development.
  • Experience creating automated dashboards and visualization tools.

Culture & Benefits

  • Comprehensive benefits including medical, dental, and vision insurance.
  • 401(k) retirement plan with company matching.
  • Paid time off and parental leave packages.
  • Culture of flexibility and commitment to work-life harmony.
  • Access to extensive mentorship, knowledge-sharing, and career development resources.