обновлено 1 месяц назад

Infrastructure Operations Engineer (AI)

Формат работы

hybrid

Тип работы

fulltime

Грейд

middle

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Infrastructure Operations Engineer (AI/GPU Cloud): Managing and optimizing data center infrastructure to ensure efficiency, reliability, and scalability of the GPU cloud with an accent on Linux, Kubernetes, and networking. Focus on troubleshooting complex infrastructure incidents, implementing automation, and collaborating with cross-functional teams to improve service delivery.

Location: Must be based in North Carolina, US (Hybrid/Travel required)

Company

Nscale is a GPU cloud provider engineered for AI, providing high-performance infrastructure for AI startups and enterprises.

What you will do

Handle day-to-day tickets and alerts in the support rotation, escalating issues to Engineering when necessary.
Manage and resolve infrastructure tickets using the internal system, maintaining clear communication with all parties.
Execute runbooks to resolve common issues and propose incremental improvements and fixes.
Monitor, troubleshoot, and triage platform issues, capturing logs for efficient handover.
Identify and implement automation opportunities to optimize operational processes.
Travel to Nscale or customer locations for deployments, troubleshooting, and operational tasks.

Requirements

Location: Must be based in North Carolina, USA
Strong fundamentals in Linux CLI, systemd, filesystems, permissions, and basic networking tools.
Solid understanding of IP addressing, subnets, VLANs, routing, DNS, and firewalls.
Experience with Kubernetes core concepts (nodes, pods, services, logs) and basic troubleshooting.
Ability to write simple Bash or Python scripts and use Git for version control.
Familiarity with GPU diagnostics (e.g., nvidia-smi) and observability dashboards.

Nice to have

Hands-on Kubernetes administration, operators, and storage/networking add-ons.
Knowledge of RDMA/InfiniBand, NCCL, or job schedulers for HPC.
Experience with Infrastructure as Code (Ansible, Terraform) and GitOps/CI/CD (GitHub Actions).
Experience with security tools like Teleport or Vault.

Culture & Benefits

Highly competitive compensation package including base salary and equity.
Dynamic progression plan tailored to individual ambitions.
"Human-First" flexibility and autonomy in shaping the workday.
Collaborative, remote-first culture within a fast-growing AI startup.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Похожие вакансии

Infrastructure Operations Engineer (AI)

Nscale

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Nice to have

Culture & Benefits

Похожие вакансии

Staff Network Engineer (AI)

Senior IT Engineer

Professional Services Engineer (AI)

Linux Systems Engineer (Cybersecurity)

IT Operations Engineer (SaaS)

Senior Systems Engineer (Linux)

Разработка

Game Dev

Design и Creative

Аналитика

Менеджмент

People & Business

Infrastructure Operations Engineer (AI)

Nscale

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Nice to have

Culture & Benefits

Categories

Похожие вакансии

Staff Network Engineer (AI)

Senior IT Engineer

Professional Services Engineer (AI)

Linux Systems Engineer (Cybersecurity)

IT Operations Engineer (SaaS)

Senior Systems Engineer (Linux)