SDE IV - GPU Engineer (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
SDE IV - GPU Engineer (AI): Leading design and optimization efforts across the GPU inference stack for Stable Diffusion, multimodal transformers, and video generation models with an accent on architecting high-performance inference runtimes, kernel dispatchers, and memory planners. Focus on driving multi-GPU parallelism strategies, establishing company-wide GPU optimization standards, and collaborating with research for scalable implementations of novel architectures.
Location: Onsite in Bangalore, India
Company
AI is an AI commerce platform shaping the next wave of e-commerce with inspiration-led shopping, backed by Google, Jio Platforms, and Mithril Capital.
What you will do
- Architect high-performance inference runtimes, kernel dispatchers, and memory planners for large diffusion and transformer workloads.
- Lead investigations into cross-GPU performance bottlenecks and scheduling inefficiencies.
- Drive multi-GPU parallelism strategies including model, pipeline, and tensor parallelization.
- Establish company-wide GPU optimization standards, tooling, and SLIs.
- Collaborate with research to design scalable implementations of novel architectures.
- Mentor engineers in profiling, tuning, and low-level optimization.
Requirements
- 5+ years in high-performance computing, GPU runtime systems, or ML infrastructure.
- Proven expertise in CUDA / Triton / C++, with deep understanding of GPU scheduling, occupancy, register usage, and tensor cores.
- Experience building and maintaining distributed inference or training systems.
- Ability to design abstractions balancing flexibility and performance.
- Strong knowledge of NCCL, NVLink, PCIe, and interconnects.
- Familiar with profiling automation and performance dashboards.
- Excellent technical leadership and mentoring capabilities.
Nice to have
- Background in compiler-aided optimization (TVM, XLA, MLIR, Triton).
- Experience tuning Stable Diffusion or transformer inference pipelines.
- Exposure to heterogeneous compute backends (AMD ROCm, TPU, ASICs).
- Experience working with hardware–software co-design initiatives.
- Open-source or research contributions in GPU optimization.
Culture & Benefits
- Flexible work arrangement to inspire work-life balance.
- Salad bar and nutritious meals provided for healthy lifestyle.
- Fitness events and a play arena for sports enthusiasts.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →