AI GPU Arch Perf Optimization Intern (AI Engineering)

Формат работы

onsite

Тип работы

fulltime

Грейд

trainee

Английский

Страна

China

Вакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

AI GPU Arch Perf Optimization Intern (AI Engineering): Optimizing core GPU compute kernels and validating GPU IP using real AI workloads with an accent on GEMM, Attention, and operator fusion. Focus on identifying compute and memory bottlenecks and supporting hardware/software codesign for next-generation AI accelerators.

Location: On-site presence required in Shanghai or Beijing, PRC

Company

hirify.global's Data Center Group (DCG) delivers Xeon-based solutions and custom x86-based products for general-purpose compute, web services, HPC, and AI-accelerated systems.

What you will do

Analyze and optimize core GPU compute kernels for AI and numerical workloads, such as GEMM, Attention, and operator fusion.
Reproduce representative AI inference and training workloads for GPU IP validation.
Perform GPU performance profiling to identify compute, memory, and pipeline bottlenecks.
Build performance profiles and models to understand architecture-level performance behavior.
Provide workload and kernel-level insights to support GPU architecture design and HW/SW codesign efforts.

Requirements

Currently pursuing a Bachelor's, Master's, or PhD degree in Computer Science, Computer Engineering, Electrical Engineering, or a related technical field.
Proficiency in Python for analysis, experimentation, or tooling.
Solid understanding of AI fundamentals, including common models and algorithms.
Basic knowledge of computer systems, CPU/GPU architecture, memory systems, and performance analysis.
Strong interest in GPU architecture, GPU programming, parallel computing, and performance optimization.

Nice to have

Experience with GPU kernels or programming models such as CUDA, OpenCL, SYCL, or Triton.
Exposure to performance optimization, compiler, or parallel computing coursework and research.
Strong analytical and problem-solving skills with the ability to reason from profiling data.
Interest in AI systems and infrastructure beyond model-level development.

Culture & Benefits

Hands-on exposure to GPU architecture and low-level performance engineering.
Opportunity to directly shape the performance of next-generation hirify.global GPU and AI accelerator platforms.
Collaborative, cross-functional engineering environment.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →