Senior Site Reliability Engineer (Compute Platforms)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior Site Reliability Engineer (Compute Platforms): Designing and supporting Kubernetes on baremetal and hypervisor platforms in a private cloud environment with an accent on architecture, standardization, and automation. Focus on building Bare Metal as a Service (BMaaS), automating OS provisioning via PXE and Redfish APIs, and implementing GitOps practices with ArgoCD.
Location: Must be based in the United States. Remote for candidates outside a 50-mile radius of San Ramon; Hybrid (3 days/week in office) for those within 50 miles of San Ramon.
Salary: $82,300 - $228,800 USD
Company
is a leading provider of cloud contact center software, bringing the power of cloud innovation to customers worldwide.
What you will do
- Lead the architecture and design of enterprise compute and hypervisor platform solutions across hardware, OS, and container orchestration layers.
- Define standards and automation frameworks for bare metal provisioning and lifecycle management.
- Design and implement Bare Metal as a Service (BMaaS) capabilities for scalable infrastructure consumption.
- Architect production-grade Kubernetes platforms on bare metal utilizing QoS and Affinity with ArgoCD.
- Develop Infrastructure-as-Code using Ansible, Terraform, Helm, and Python/Bash automation.
- Implement CI/CD pipelines for infrastructure updates, patching, upgrades, and hardware validation.
Requirements
- 6+ years of experience in infrastructure, platform engineering, or DevOps with a focus on Compute system design.
- Proven expertise in automating bare metal compute environments at scale, including PXE boot and network-based OS provisioning.
- Deep expertise with Ubuntu Linux and KVM hypervisors (Suse Harvester, OpenStack) in enterprise environments.
- Hands-on experience with Redfish APIs for hardware provisioning and remote lifecycle operations.
- Experience deploying production-grade Kubernetes clusters and managing enterprise hardware (Cisco UCS, Dell PowerEdge, Supermicro, HPE).
- Proficiency with IaC tools (Terraform, Ansible) and scripting in Python or Bash.
Nice to have
- Knowledge of CIS/NIST security and infrastructure lifecycle management.
- ITIL Foundation or advanced certifications in ITSM standard methodology.
- Ubuntu Certifications or CNCF certifications (CKA, CKS).
- Background in telco, edge cloud, or large enterprise environments.
Culture & Benefits
- Comprehensive health, dental, and vision coverage with 100% employee portion covered by the company.
- 401k saving plan with employer matching.
- Generous employee stock purchase plan.
- 12 weeks of paid parental leave and paid volunteer hours.
- Access to an innovative mental health support platform for employees and dependents.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →