Назад
Company hidden
5 месяцев назад

SDE IV - GPU Engineer (AI)

Формат работы
onsite
Тип работы
fulltime
Грейд
lead
Английский
b2
Страна
India
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

SDE IV - GPU Engineer (AI): Leading design and optimization efforts across the GPU inference stack for Stable Diffusion, multimodal transformers, and video generation models with an accent on architecting high-performance inference runtimes, kernel dispatchers, and memory planners. Focus on driving multi-GPU parallelism strategies, establishing company-wide GPU optimization standards, and collaborating with research for scalable implementations of novel architectures.

Location: Onsite in Bangalore, India

Company

hirify.global AI is an AI commerce platform shaping the next wave of e-commerce with inspiration-led shopping, backed by Google, Jio Platforms, and Mithril Capital.

What you will do

  • Architect high-performance inference runtimes, kernel dispatchers, and memory planners for large diffusion and transformer workloads.
  • Lead investigations into cross-GPU performance bottlenecks and scheduling inefficiencies.
  • Drive multi-GPU parallelism strategies including model, pipeline, and tensor parallelization.
  • Establish company-wide GPU optimization standards, tooling, and SLIs.
  • Collaborate with research to design scalable implementations of novel architectures.
  • Mentor engineers in profiling, tuning, and low-level optimization.

Requirements

  • 5+ years in high-performance computing, GPU runtime systems, or ML infrastructure.
  • Proven expertise in CUDA / Triton / C++, with deep understanding of GPU scheduling, occupancy, register usage, and tensor cores.
  • Experience building and maintaining distributed inference or training systems.
  • Ability to design abstractions balancing flexibility and performance.
  • Strong knowledge of NCCL, NVLink, PCIe, and interconnects.
  • Familiar with profiling automation and performance dashboards.
  • Excellent technical leadership and mentoring capabilities.

Nice to have

  • Background in compiler-aided optimization (TVM, XLA, MLIR, Triton).
  • Experience tuning Stable Diffusion or transformer inference pipelines.
  • Exposure to heterogeneous compute backends (AMD ROCm, TPU, ASICs).
  • Experience working with hardware–software co-design initiatives.
  • Open-source or research contributions in GPU optimization.

Culture & Benefits

  • Flexible work arrangement to inspire work-life balance.
  • Salad bar and nutritious meals provided for healthy lifestyle.
  • Fitness events and a play arena for sports enthusiasts.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →