Infrastructure Engineer – Storage Platform (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Infrastructure Engineer (Storage Platform): Operating and maintaining large-scale distributed storage platforms for high-performance AI/ML workloads with an accent on Ceph, high-performance NAS, and Kubernetes integration. Focus on optimizing IOPS, throughput, and latency to ensure system reliability and performance for GPU-intensive environments.
Location: Las Vegas, Nevada. Must be authorized to work in the United States.
Company
delivers seamless, secure, and resilient AI compute at scale via a versatile cloud platform that eliminates infrastructure barriers for AI builders.
What you will do
- Operate and maintain distributed storage platforms, including Ceph (RBD, CephFS, RGW) and high-performance NAS (Weka, VAST Data).
- Manage the full storage lifecycle, including cluster expansion, upgrades, and migrations.
- Analyze and troubleshoot storage performance (IOPS, throughput, latency) and remediate bottlenecks across disk subsystems and network paths.
- Support Kubernetes-integrated storage, managing CSI drivers, StorageClasses, and PersistentVolumes.
- Execute and improve automation for storage deployment using Ansible, Terraform, and Helm.
- Collaborate with DevOps, Network, and Compute teams to ensure end-to-end performance across the infrastructure stack.
Requirements
- 4–7+ years of experience in infrastructure, systems, or storage operations.
- Strong hands-on experience operating distributed storage systems in production, specifically Ceph.
- Experience with modern high-performance storage platforms such as Weka or VAST Data.
- Deep Linux systems knowledge and understanding of data replication and failure domains.
- Ability to troubleshoot across storage systems, network paths, and compute clients.
- Authorization to work in the United States is required.
Nice to have
- Experience supporting AI/ML or HPC workloads.
- Familiarity with NVMe-based storage architectures and RDMA or high-throughput Ethernet.
- Experience operating storage across multiple data centers.
- Exposure to object storage and S3-compatible APIs.
Culture & Benefits
- Comprehensive health benefits: 100% paid Medical, Dental, and Vision insurance.
- Financial perks: 401(k) and Company Health Savings Account contributions.
- Full insurance coverage: Paid Short Term and Long Term Disability, and Life insurance options.
- Time off: Flexible PTO and Paid Holidays.
- Additional perks: Parental leave and various in-office benefits.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →