Назад
Company hidden
2 дня назад

Infrastructure Support Engineer (GPUs)

Π€ΠΎΡ€ΠΌΠ°Ρ‚ Ρ€Π°Π±ΠΎΡ‚Ρ‹
onsite
Π’ΠΈΠΏ Ρ€Π°Π±ΠΎΡ‚Ρ‹
fulltime
Π“Ρ€Π΅ΠΉΠ΄
middle
Английский
b2
Π‘Ρ‚Ρ€Π°Π½Π°
Singapore
Вакансия ΠΈΠ· списка Hirify.GlobalВакансия ΠΈΠ· Hirify Global, списка ΠΌΠ΅ΠΆΠ΄ΡƒΠ½Π°Ρ€ΠΎΠ΄Π½Ρ‹Ρ… tech-ΠΊΠΎΠΌΠΏΠ°Π½ΠΈΠΉ
Для мэтча ΠΈ ΠΎΡ‚ΠΊΠ»ΠΈΠΊΠ° Π½ΡƒΠΆΠ΅Π½ Plus

ΠœΡΡ‚Ρ‡ & Π‘ΠΎΠΏΡ€ΠΎΠ²ΠΎΠ΄

Для мэтча с этой вакансиСй Π½ΡƒΠΆΠ΅Π½ Plus

ОписаниС вакансии

ВСкст:
/

TL;DR

Infrastructure Support Engineer (GPUs): Maintaining and troubleshooting high-performance GPU cloud infrastructure for AI workloads with an accent on service reliability and rapid incident response. Focus on managing Kubernetes clusters, Linux-based systems, and GPU-specific diagnostics to ensure seamless AI development for customers.

Location: Singapore (includes availability to travel to hirify.global or Customer locations)

Company

hirify.global is a GPU cloud provider engineered specifically for AI startups and large enterprises to reduce the complexity of AI development.

What you will do

  • Handle day-to-day tickets and alerts within the support duty rotation, escalating complex incidents to Engineering.
  • Resolve common issues using established runbooks and contribute to their improvement and incremental fixes.
  • Monitor, troubleshoot, and triage platform issues, capturing logs and facts for efficient handover.
  • Collaborate with cross-functional teams and serve as the escalation point for onsite operations staff.
  • Document validated steps and contribute to training materials to build team capability.
  • Identify and implement automation opportunities to optimize support processes.

Requirements

  • 2-4 years of experience in support, operations, or infrastructure engineering, ideally within cloud or Data Centre environments.
  • Proficiency in Linux CLI, system services, filesystems, permissions, and basic networking tools.
  • Solid grasp of networking basics: IP addressing, subnets, VLANs, routing, DNS, and firewalls.
  • Exposure to Kubernetes core concepts (nodes, pods, services, logs) and basic troubleshooting.
  • Familiarity with GPU diagnostics such as nvidia-smi.
  • Ability to write simple Bash or Python scripts and use Git for version control.

Nice to have

  • Hands-on experience with Kubernetes administration, operators, or specialized storage/networking add-ons.
  • Knowledge of RDMA/InfiniBand, HPC concepts, and NCCL for performance troubleshooting.
  • Experience with Infrastructure as Code tools like Ansible or Terraform.
  • Participation in GitOps and CI/CD pipelines using GitHub Actions.
  • Experience with security tooling such as Teleport or Vault.

Culture & Benefits

  • Culture of relentless innovation, ownership, and accountability.
  • Commitment to openness, transparency, and an open-source approach to build trust.
  • Dedicated focus on sustainability and reducing the environmental impact of AI technologies.
  • Fast, efficient, and respectful collaboration within a global team.
  • Inclusive environment with an equal opportunities statement for diverse backgrounds.

Π‘ΡƒΠ΄ΡŒΡ‚Π΅ остороТны: Ссли Ρ€Π°Π±ΠΎΡ‚ΠΎΠ΄Π°Ρ‚Π΅Π»ΡŒ просит Π²ΠΎΠΉΡ‚ΠΈ Π² ΠΈΡ… систСму, ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΡ iCloud/Google, ΠΏΡ€ΠΈΡΠ»Π°Ρ‚ΡŒ ΠΊΠΎΠ΄/ΠΏΠ°Ρ€ΠΎΠ»ΡŒ, Π·Π°ΠΏΡƒΡΡ‚ΠΈΡ‚ΡŒ ΠΊΠΎΠ΄/ПО, Π½Π΅ Π΄Π΅Π»Π°ΠΉΡ‚Π΅ этого - это мошСнники. ΠžΠ±ΡΠ·Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎ ΠΆΠΌΠΈΡ‚Π΅ "ΠŸΠΎΠΆΠ°Π»ΠΎΠ²Π°Ρ‚ΡŒΡΡ" ΠΈΠ»ΠΈ ΠΏΠΈΡˆΠΈΡ‚Π΅ Π² ΠΏΠΎΠ΄Π΄Π΅Ρ€ΠΆΠΊΡƒ. ΠŸΠΎΠ΄Ρ€ΠΎΠ±Π½Π΅Π΅ Π² Π³Π°ΠΉΠ΄Π΅ β†’