AI Systems Performance Engineer (AI Fabric)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
AI Systems Performance Engineer (AI Fabric): Driving performance benchmarking of AI inference, training, and storage workloads with an accent on Ethernet fabric optimization. Focus on executing rigorous benchmarks, isolating complex system bottlenecks, and tuning network parameters for maximum throughput and minimum latency.
Location: Onsite in San Jose, CA, USA
Salary: $141,300 - $226,000
Company
A global technology leader that designs, develops, and supplies a broad range of semiconductor and infrastructure software solutions.
What you will do
- Run industry-standard AI performance benchmarks, with a strong emphasis on MLPerf and NCCL tests.
- Tune and optimize Ethernet fabric parameters to ensure seamless data flow for distributed AI workloads.
- Identify and troubleshoot complex performance bottlenecks across Linux OS, server hardware, and Ethernet switches.
- Design and implement robust performance testing frameworks and automation tools.
- Collaborate with hardware and software stakeholders to provide actionable improvement recommendations.
Requirements
- Bachelor's or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field.
- 10-12+ years of related industry experience.
- Deep expertise in Linux operating systems, including system-level performance tuning.
- Strong proficiency in Python and C++.
- Knowledge of PyTorch and how AI models consume compute and network resources.
- Proven experience in performance testing and validating Ethernet switch systems.
Nice to have
- Experience with RDMA and RoCEv2.
- Experience building CI/CD pipelines for automated hardware or software performance regression testing.
- Familiarity with Docker and Kubernetes in AI deployments.
Culture & Benefits
- Competitive annual base salary with discretionary bonuses and equity grants.
- Comprehensive medical, dental, and vision plans.
- 401(K) participation with company matching.
- Employee Stock Purchase Program (ESPP) and Employee Assistance Program (EAP).
- Paid sick leave, vacation time, and company-paid holidays.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →