Virtualization Operations Engineer (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Virtualization Operations Engineer (Linux/KVM): Operating and maintaining large-scale infrastructure platforms for high-performance AI workloads with an accent on hypervisor stability, performance tuning, and VM lifecycle management. Focus on troubleshooting resource contention in GPU-intensive environments and automating infrastructure deployments.
Location: On-site in Las Vegas, Nevada. Authorization to work in the United States is required.
Company
provides seamless, secure, and resilient AI compute at scale via a versatile cloud platform that eliminates infrastructure barriers for AI builders.
What you will do
- Operate and maintain large-scale Proxmox and KVM-based virtualization environments.
- Manage the full VM lifecycle, including provisioning, configuration, migration, and decommissioning.
- Monitor platform health and resolve issues related to host failures and resource contention (CPU, memory, disk, network).
- Execute infrastructure changes such as cluster expansions, host maintenance, and upgrades.
- Standardize deployments using automation tools like Ansible to reduce manual intervention.
- Collaborate with DevOps, Network, and Storage engineering teams on incident response and root cause analysis.
Requirements
- 4–7+ years of experience in infrastructure, systems, or platform operations.
- Hands-on experience with Linux-based virtualization (KVM/QEMU, Proxmox, VMware) and strong Linux fundamentals.
- Proven ability to troubleshoot CPU/memory contention and disk I/O bottlenecks.
- Experience with infrastructure automation tools (e.g., Ansible).
- Must be authorized to work in the United States.
Nice to have
- Experience operating infrastructure at scale (100+ hosts).
- Familiarity with GPU-based systems, NUMA awareness, and performance tuning.
- Exposure to high-throughput networking (bonding, VLANs, SR-IOV) and distributed storage.
- Experience with Kubernetes or container platforms and CSP environments.
Culture & Benefits
- Comprehensive health benefits: 100% paid Medical, Dental, and Vision insurance.
- Financial security: 401(k) plan and company HSA contributions.
- Insurance: Paid Short Term and Long Term Disability, Life, and Voluntary Supplemental options.
- Work-life balance: Flexible PTO and paid holidays.
- Equity: Stock options.
- Additional perks: Parental leave, Employee Assistance Program, and in-office perks.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →