Staff Infrastructure Engineer (Storage)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Staff Infrastructure Engineer (Storage): Designing and operating large-scale distributed storage platforms for high-performance AI/ML workloads with an accent on system resilience, scalability, and performance tuning. Focus on integrating Ceph with Kubernetes and solving complex bottlenecks across disk subsystems and RDMA network paths.
Location: Las Vegas, Nevada (Must be authorized to work in the United States)
Company
provides seamless and resilient AI compute at scale via a versatile cloud platform that eliminates infrastructure barriers for AI builders.
What you will do
- Design and evolve storage architectures supporting Kubernetes and high-performance compute workloads, prioritizing resilience and failure-domain awareness.
- Own production storage platforms, including Ceph (RBD, CephFS, RGW) and high-performance NAS (Weka, VAST).
- Lead lifecycle operations: cluster design, deployment, scaling, upgrades, and migrations.
- Analyze storage performance (IOPS, throughput, latency) and resolve bottlenecks across disk subsystems and network paths.
- Implement Kubernetes storage patterns including CSI drivers and StorageClasses for stateful workloads.
- Develop automation for storage deployment and lifecycle management using Ansible, Terraform, and Helm.
Requirements
- 7+ years of experience in infrastructure, storage, or distributed systems.
- Deep hands-on experience with Ceph (RBD, CephFS, RGW) in production environments.
- Experience with high-performance storage platforms such as Weka or VAST Data.
- Strong Linux systems expertise and the ability to troubleshoot across storage, network, and compute layers.
- Must have valid authorization to work in the United States.
Nice to have
- Experience supporting AI/ML or HPC workloads.
- Familiarity with NVMe-based architectures and RDMA or high-throughput Ethernet.
- Experience integrating storage with Kubernetes at scale across multiple data centers.
- Exposure to object storage and S3-compatible APIs.
Culture & Benefits
- Equity through stock options.
- 100% paid medical, dental, and vision insurance.
- Company contributions to Health Savings Account (HSA) and 401(k) plan.
- Flexible PTO and paid holidays.
- Comprehensive insurance coverage including short/long term disability and life insurance.
- Parental leave and various in-office perks.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →