Cloud Platform Engineer (NPU/Kubernetes)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Cloud Platform Engineer (Kubernetes/NPU): Designing and operating an on-premises datacenter and Internal Developer Platform (IDP) for next-generation NPU hardware with an accent on consolidating development infrastructure into a shared self-service platform. Focus on optimizing Kubernetes compute resources, managing high-speed RDMA networking, and implementing bare-metal provisioning.
Location: Onsite in Seongnam, South Korea
Company
is a hardware company building next-generation NPUs and the cloud platforms required to operate them.
What you will do
- Design, build, and operate a Kubernetes-based Internal Developer Platform (IDP) for bare-metal, VM, and container resource provisioning.
- Develop end-to-end platform services using Python, Rust, TypeScript, and React.
- Optimize Kubernetes compute resources, including RDMA networking (RoCE, SR-IOV) and PCI passthrough.
- Automate infrastructure using IaC tools like Terraform, Ansible, and Helm through GitOps workflows.
- Build and maintain a CMDB service to manage internal infrastructure assets.
- Support the software development lifecycle by managing the internal service catalog and CI/CD pipelines.
Requirements
- Working understanding of ML inference and PyTorch workflows.
- Hands-on experience building end-to-end services in Python, Rust, Go, or JS/TS.
- Strong experience in Kubernetes cluster build, operations, and workload/network troubleshooting.
- Pragmatic approach to problem-solving and automating recurring toil.
- Ability to collaborate effectively across hardware, firmware, and application teams.
- Must be based in or able to work onsite in Seongnam, South Korea.
Nice to have
- Experience with virtualization and accelerator scheduling (KubeVirt, DevicePlugin, DRA).
- Linux systems engineering skills, including kernel and driver debugging (KVM/QEMU).
- Knowledge of high-speed networking (RDMA, RoCE, SR-IOV, PCIe/NUMA).
- Experience with LLM inference frameworks such as vLLM.
- Hands-on experience with bare-metal provisioning using Metal3 or Ironic.
Hiring process
- Document screening followed by an online interview.
- On-site interview including a technical assignment.
- Culture-fit interview and final compensation negotiation.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →