Infrastructure Engineer – Storage Platform (AI)

Формат работы

onsite

Тип работы

fulltime

Грейд

middle/senior

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Infrastructure Engineer (Storage Platform): Operating and maintaining large-scale distributed storage platforms for high-performance AI/ML workloads with an accent on Ceph, high-performance NAS, and Kubernetes integration. Focus on optimizing IOPS, throughput, and latency to ensure system reliability and performance for GPU-intensive environments.

Location: Las Vegas, Nevada. Must be authorized to work in the United States.

Company

hirify.global delivers seamless, secure, and resilient AI compute at scale via a versatile cloud platform that eliminates infrastructure barriers for AI builders.

What you will do

Operate and maintain distributed storage platforms, including Ceph (RBD, CephFS, RGW) and high-performance NAS (Weka, VAST Data).
Manage the full storage lifecycle, including cluster expansion, upgrades, and migrations.
Analyze and troubleshoot storage performance (IOPS, throughput, latency) and remediate bottlenecks across disk subsystems and network paths.
Support Kubernetes-integrated storage, managing CSI drivers, StorageClasses, and PersistentVolumes.
Execute and improve automation for storage deployment using Ansible, Terraform, and Helm.
Collaborate with DevOps, Network, and Compute teams to ensure end-to-end performance across the infrastructure stack.

Requirements

4–7+ years of experience in infrastructure, systems, or storage operations.
Strong hands-on experience operating distributed storage systems in production, specifically Ceph.
Experience with modern high-performance storage platforms such as Weka or VAST Data.
Deep Linux systems knowledge and understanding of data replication and failure domains.
Ability to troubleshoot across storage systems, network paths, and compute clients.
Authorization to work in the United States is required.

Nice to have

Experience supporting AI/ML or HPC workloads.
Familiarity with NVMe-based storage architectures and RDMA or high-throughput Ethernet.
Experience operating storage across multiple data centers.
Exposure to object storage and S3-compatible APIs.

Culture & Benefits

Comprehensive health benefits: 100% paid Medical, Dental, and Vision insurance.
Financial perks: 401(k) and Company Health Savings Account contributions.
Full insurance coverage: Paid Short Term and Long Term Disability, and Life insurance options.
Time off: Flexible PTO and Paid Holidays.
Additional perks: Parental leave and various in-office benefits.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →