обновлено 2 дня назад

Infrastructure Support Lead (AI)

Формат работы

remote (только USA)/hybrid

Тип работы

fulltime

Грейд

lead

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Infrastructure Support Lead (GPU Cloud/AI): Leading and managing the US infrastructure support team to deliver high-performance GPU cloud services with an accent on team leadership, service delivery (SLA), and operational excellence. Focus on managing Kubernetes and Linux-based infrastructure at scale, solving complex technical incidents across compute and networking layers, and automating operational workflows.

Location: Must be based in the US. Remote-first team, but requires travel to Nscale or customer sites when needed.

Company

Nscale is a GPU cloud engineered for AI, providing cost-effective, high-performance infrastructure for AI startups and large enterprise customers.

What you will do

Manage, coach, and mentor the Infrastructure US team, including performance reviews, development planning, and shift scheduling.
Own ticket queue management and ensure strict adherence to ITIL processes across incidents, requests, and changes.
Drive operational excellence by improving dashboards, alerting, and runbooks to reduce repeat incidents.
Provide hands-on technical support across compute, storage, networking, and Kubernetes environments at scale.
Act as the regional escalation point for high-impact incidents and lead post-incident reviews to identify recurring patterns.
Collaborate with Senior Engineers on technical improvements and the development of operational tooling.

Requirements

Must be based in the United States to lead the regional team.
Proven experience leading or managing engineers in an operational support environment with a focus on meeting SLAs.
Strong Linux systems engineering expertise with a track record of troubleshooting production compute, storage, and network layers.
Experience operating and debugging Kubernetes environments and distributed systems.
Solid understanding of networking fundamentals (L2/L3, routing, VLANs) and high-performance fabrics like RDMA/NVLink.
Proficiency with scripting (Bash, Python) and Infrastructure as Code tools such as Ansible and Terraform.

Nice to have

Experience with GPU platforms (NVIDIA/AMD) and performance diagnostics (nvidia-smi, NCCL).
Exposure to HPC or distributed workloads involving InfiniBand or MPI.
Experience with CI/CD or GitOps tooling.
Experience working in multi-region environments.

Culture & Benefits

Highly competitive compensation package including base salary and equity.
Remote-first work culture with "Human-First Flexibility," granting autonomy to shape your own schedule.
Dynamic progression plan tailored to individual ambitions and ownership of impact.
Collaborative and innovative environment within a fast-growing tech startup.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Infrastructure Support Lead (AI)

Nscale

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Nice to have

Culture & Benefits

Похожие вакансии

Systems Engineer (VDI Platform)

Network Engineer (Automation)

Senior IT Engineer

Systems Engineer III (Windows/Linux)

Infrastructure Systems Engineer

Linux Systems Administrator

Разработка

Game Dev

Design и Creative

Аналитика

Менеджмент

People & Business

Infrastructure Support Lead (AI)

Nscale

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Nice to have

Culture & Benefits

Categories

Похожие вакансии

Systems Engineer (VDI Platform)

Network Engineer (Automation)

Senior IT Engineer

Systems Engineer III (Windows/Linux)

Infrastructure Systems Engineer

Linux Systems Administrator