Назад
Company hidden
3 месяца назад

Framework Software Engineer (AI)

Формат работы
onsite
Тип работы
fulltime
Грейд
middle/senior
Английский
b2
Страна
SK
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Framework Software Engineer (AI): Designing and optimizing high-performance inference frameworks for large-scale distributed serving of LLM workloads with an accent on memory management, parallelism, and throughput. Focus on building scalable multi-node architectures and driving data-informed decisions for NPU/GPU-based systems.

Location: Seongnam, South Korea (Onsite)

Company

hirify.global is an AI semiconductor startup developing high-performance hardware and software solutions for accelerated AI inference.

What you will do

  • Design and develop high-performance inference frameworks for large-scale distributed LLM serving.
  • Optimize end-to-end serving performance metrics including TTFT, ITL, and throughput.
  • Implement advanced techniques like continuous batching, KV-cache management, and speculative decoding.
  • Architect multi-node serving solutions involving prefill/decode disaggregation and distributed caching.
  • Analyze runtime behavior, communication overhead, and memory usage across heterogeneous environments.
  • Collaborate with infrastructure, compiler, and hardware teams to co-design end-to-end AI systems.

Requirements

  • Master's degree or higher in CS, EE, or a related technical field.
  • Strong proficiency in Python, C++, and PyTorch with deep knowledge of runtime internals.
  • Hands-on experience with inference serving or high-performance ML systems.
  • Solid understanding of Linux systems, profiling, and debugging performance bottlenecks.
  • Ability to reason about system-level trade-offs and solve complex architectural problems.
  • Clear communication skills and experience collaborating in fast-paced engineering teams.

Nice to have

  • Experience with serving frameworks like vLLM, SGLang, or TensorRT-LLM.
  • Deep understanding of attention mechanisms and memory-efficient inference.
  • Experience with multi-node inference and tensor/pipeline parallelism.
  • Proven record of open-source contributions to ML infrastructure projects.

Hiring process

  • Document screening followed by an online interview.
  • On-site interview including a technical assignment presentation.
  • Culture-fit interview and final offer discussion.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →