Назад
Company hidden
2 месяца назад

NPU SDK Software Engineer (AI)

Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
SK
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

NPU SDK Software Engineer (AI): Own end-to-end SDK user experience for bringing state-of-the-art AI models (LLM, Multi-modal, Vision, Speech) onto hirify.global NPU with an accent on model porting, CI/CD pipelines, framework integration, and developer documentation. Focus on optimizing performance, ensuring numerical parity, designing verification tools, and integrating with Hugging Face, vLLM ecosystems.

Location: hirify.global office in Seongnam-si, Bundang-gu, South Korea (경기도 성남시 분당구 정자일로156번길 6, R-TOWER 3F ~ 8F)

Company

AI hardware company developing Neural Processing Units (NPU) for edge AI inference.

What you will do

  • Design, build, and operate CI/CD pipelines to automate SDK release processes
  • Port and optimize deep learning models (LLM, Multi-modal) onto hirify.global NPU, analyze bottlenecks, validate against GPU references
  • Integrate SDK with open-source ecosystems like Hugging Face Transformers, Diffusers, vLLM
  • Design model-feeding frameworks and debugging utilities for functional correctness and stability
  • Author developer documentation including API references, tutorials, model matrices, release notes

Requirements

  • Master’s degree or above in Computer Science, Electrical Engineering, or related field
  • Deep knowledge of CI/CD tools (GitHub Actions, Airflow, Buildkite)
  • Strong understanding of deep learning architectures (LLMs, Multi-modal, Vision, Speech)
  • Familiarity with PyTorch internals, model customization, graph transformations
  • High-performance development in Python and Modern C++ (C++17+)
  • Experience with Hugging Face libraries and vLLM integration
  • Hands-on with Python and Kubernetes for AI inference workloads

Nice to have

  • 5+ years relevant experience
  • Model optimization on hardware (CUDA, TensorRT, MLIR, Triton, TVM)
  • Scalable ML infrastructure with Docker/Kubernetes
  • AI Harness Engineering experience
  • Open-source contributions
  • Knowledge of JAX or TensorFlow internals

Hiring process

  • Document screening > Online interview > On-site interview > Culture-fit interview > Compensation discussion > Final offer
  • Process may vary by role and schedule
  • Results sent via email

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →