Назад
Company hidden
обновлено 2 дня назад

Staff Infrastructure Engineer (Storage)

Формат работы
onsite
Тип работы
fulltime
Грейд
lead
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Staff Infrastructure Engineer (Storage): Designing and operating large-scale distributed storage platforms for high-performance AI/ML workloads with an accent on system resilience, scalability, and performance tuning. Focus on integrating Ceph with Kubernetes and solving complex bottlenecks across disk subsystems and RDMA network paths.

Location: Las Vegas, Nevada (Must be authorized to work in the United States)

Company

hirify.global provides seamless and resilient AI compute at scale via a versatile cloud platform that eliminates infrastructure barriers for AI builders.

What you will do

  • Design and evolve storage architectures supporting Kubernetes and high-performance compute workloads, prioritizing resilience and failure-domain awareness.
  • Own production storage platforms, including Ceph (RBD, CephFS, RGW) and high-performance NAS (Weka, VAST).
  • Lead lifecycle operations: cluster design, deployment, scaling, upgrades, and migrations.
  • Analyze storage performance (IOPS, throughput, latency) and resolve bottlenecks across disk subsystems and network paths.
  • Implement Kubernetes storage patterns including CSI drivers and StorageClasses for stateful workloads.
  • Develop automation for storage deployment and lifecycle management using Ansible, Terraform, and Helm.

Requirements

  • 7+ years of experience in infrastructure, storage, or distributed systems.
  • Deep hands-on experience with Ceph (RBD, CephFS, RGW) in production environments.
  • Experience with high-performance storage platforms such as Weka or VAST Data.
  • Strong Linux systems expertise and the ability to troubleshoot across storage, network, and compute layers.
  • Must have valid authorization to work in the United States.

Nice to have

  • Experience supporting AI/ML or HPC workloads.
  • Familiarity with NVMe-based architectures and RDMA or high-throughput Ethernet.
  • Experience integrating storage with Kubernetes at scale across multiple data centers.
  • Exposure to object storage and S3-compatible APIs.

Culture & Benefits

  • Equity through stock options.
  • 100% paid medical, dental, and vision insurance.
  • Company contributions to Health Savings Account (HSA) and 401(k) plan.
  • Flexible PTO and paid holidays.
  • Comprehensive insurance coverage including short/long term disability and life insurance.
  • Parental leave and various in-office perks.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →