Framework Software Engineer (AI)

Формат работы

onsite

Тип работы

fulltime

Грейд

middle/senior

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Framework Software Engineer (AI): Designing and optimizing high-performance inference frameworks for large-scale distributed serving of LLM workloads with an accent on memory management, parallelism, and throughput. Focus on building scalable multi-node architectures and driving data-informed decisions for NPU/GPU-based systems.

Location: Seongnam, South Korea (Onsite)

Company

hirify.global is an AI semiconductor startup developing high-performance hardware and software solutions for accelerated AI inference.

What you will do

Design and develop high-performance inference frameworks for large-scale distributed LLM serving.
Optimize end-to-end serving performance metrics including TTFT, ITL, and throughput.
Implement advanced techniques like continuous batching, KV-cache management, and speculative decoding.
Architect multi-node serving solutions involving prefill/decode disaggregation and distributed caching.
Analyze runtime behavior, communication overhead, and memory usage across heterogeneous environments.
Collaborate with infrastructure, compiler, and hardware teams to co-design end-to-end AI systems.

Requirements

Master's degree or higher in CS, EE, or a related technical field.
Strong proficiency in Python, C++, and PyTorch with deep knowledge of runtime internals.
Hands-on experience with inference serving or high-performance ML systems.
Solid understanding of Linux systems, profiling, and debugging performance bottlenecks.
Ability to reason about system-level trade-offs and solve complex architectural problems.
Clear communication skills and experience collaborating in fast-paced engineering teams.

Nice to have

Experience with serving frameworks like vLLM, SGLang, or TensorRT-LLM.
Deep understanding of attention mechanisms and memory-efficient inference.
Experience with multi-node inference and tensor/pipeline parallelism.
Proven record of open-source contributions to ML infrastructure projects.

Hiring process

Document screening followed by an online interview.
On-site interview including a technical assignment presentation.
Culture-fit interview and final offer discussion.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →