Назад
Company hidden
3 дня назад

Infrastructure Engineer – Storage Platform (AI)

Формат работы
onsite
Тип работы
fulltime
Грейд
middle/senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Infrastructure Engineer (Storage Platform): Operating and maintaining large-scale distributed storage platforms for high-performance AI/ML workloads with an accent on Ceph, high-performance NAS, and Kubernetes integration. Focus on optimizing IOPS, throughput, and latency to ensure system reliability and performance for GPU-intensive environments.

Location: Las Vegas, Nevada. Must be authorized to work in the United States.

Company

hirify.global delivers seamless, secure, and resilient AI compute at scale via a versatile cloud platform that eliminates infrastructure barriers for AI builders.

What you will do

  • Operate and maintain distributed storage platforms, including Ceph (RBD, CephFS, RGW) and high-performance NAS (Weka, VAST Data).
  • Manage the full storage lifecycle, including cluster expansion, upgrades, and migrations.
  • Analyze and troubleshoot storage performance (IOPS, throughput, latency) and remediate bottlenecks across disk subsystems and network paths.
  • Support Kubernetes-integrated storage, managing CSI drivers, StorageClasses, and PersistentVolumes.
  • Execute and improve automation for storage deployment using Ansible, Terraform, and Helm.
  • Collaborate with DevOps, Network, and Compute teams to ensure end-to-end performance across the infrastructure stack.

Requirements

  • 4–7+ years of experience in infrastructure, systems, or storage operations.
  • Strong hands-on experience operating distributed storage systems in production, specifically Ceph.
  • Experience with modern high-performance storage platforms such as Weka or VAST Data.
  • Deep Linux systems knowledge and understanding of data replication and failure domains.
  • Ability to troubleshoot across storage systems, network paths, and compute clients.
  • Authorization to work in the United States is required.

Nice to have

  • Experience supporting AI/ML or HPC workloads.
  • Familiarity with NVMe-based storage architectures and RDMA or high-throughput Ethernet.
  • Experience operating storage across multiple data centers.
  • Exposure to object storage and S3-compatible APIs.

Culture & Benefits

  • Comprehensive health benefits: 100% paid Medical, Dental, and Vision insurance.
  • Financial perks: 401(k) and Company Health Savings Account contributions.
  • Full insurance coverage: Paid Short Term and Long Term Disability, and Life insurance options.
  • Time off: Flexible PTO and Paid Holidays.
  • Additional perks: Parental leave and various in-office benefits.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →