Назад
Company hidden
2 дня назад

Senior Infrastructure Support Engineer (GPUs)

Формат работы
onsite
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
Singapore
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior Infrastructure Support Engineer (GPUs): Maintaining and optimizing high-performance GPU cloud infrastructure for AI workloads with an accent on Linux systems engineering, Kubernetes, and high-speed networking. Focus on resolving complex technical incidents, automating operational tasks, and improving system observability.

Location: Singapore (Onsite)

Company

hirify.global is a GPU cloud engineered specifically for AI, providing cost-effective, high-performance infrastructure for AI start-ups and large enterprises.

What you will do

  • Participate in the Support duty rotation, collaborating with Infrastructure, SRE, and Product Engineering on incidents and changes.
  • Proactively improve dashboards, alerts, and runbooks to prevent repeat incidents.
  • Manage and resolve technical tickets while keeping internal and external stakeholders informed.
  • Design and implement automation scripts and tools to optimize operational processes.
  • Conduct root cause analysis (RCA) for major incidents and recommend long-term architectural fixes.
  • Respond to critical incidents during out-of-business hours as part of an on-call rotation.

Requirements

  • Location: Must be based in Singapore with ability to provide onsite technical expertise.
  • Expertise in Linux systems engineering at scale, including kernel modules and networking stack troubleshooting.
  • Experience operating and troubleshooting Kubernetes (K8s) clusters.
  • Practical experience with GPU platforms (NVIDIA/AMD), including drivers, nvidia-smi, and NCCL diagnostics.
  • Strong networking fundamentals: L2/L3, BGP, VLANs, VXLAN, and high-performance fabrics (RDMA/NVLink).
  • Proficiency in Bash, Python, or JavaScript, and infrastructure automation tools (Ansible, Terraform, Puppet, or Chef).

Nice to have

  • Experience with automated network deployment and configuration in critical environments.
  • Knowledge of GPU HPC concepts, including InfiniBand, MPI, and Pyxis/Enroot.
  • Experience building CI/CD pipelines using GitOps tooling and GitHub Actions.

Culture & Benefits

  • Culture of relentless innovation, ownership, and high accountability.
  • Environment based on openness, transparency, and candid communication.
  • Customer-centric focus with a commitment to delivering impactful AI solutions.
  • Strong emphasis on sustainability and long-term environmental responsibility.
  • Inclusive workplace with an equal opportunities statement for diverse backgrounds.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →