Эта вакансия в архиве
Посмотреть похожие вакансии ↓обновлено 20 часов назад
Kubernetes Platform Engineer (AI)
Описание вакансии
Текст:
TL;DR
Senior Kubernetes Platform Engineer (AI): Building and optimizing seamless, secure, reliable, and resilient AI infrastructure at scale, with an accent on maintaining stability and reliability of bare-metal Kubernetes infrastructure. Focus on troubleshooting, incident response, and day-to-day cluster operations across multi-tenant workloads.
Location: Las Vegas, Nevada, USA (Onsite)
Company
Cloud is dedicated to building secure, reliable, and resilient AI infrastructure at scale to empower innovation.
What you will do
- Own and troubleshoot operational issues within Kubernetes environments.
- Maintain and monitor core services like Cilium, HAProxy, and Prometheus.
- Ensure uptime, performance, and reliability of multi-tenant clusters.
- Assist with Ingress/Egress connectivity and network debugging.
- Support internal and customer teams in secure, isolated VPC environments.
- Collaborate with senior engineers on automation and cluster lifecycle improvements.
Requirements
- Bachelor of Science in Computer Science, Computer Engineering, or a related technical field, or equivalent practical experience.
- 5+ years experience in DevOps, SRE, or Linux infrastructure roles.
- 4+ years of hands-on experience with Kubernetes in production.
- 3+ years designing and operating multi-tenant Kubernetes platforms at CSP or hyperscaler scale.
- Proven experience implementing production-grade cluster authentication (OIDC/SSO integration, RBAC policies) and advanced network design (CNI selection/configuration, network policies, service mesh architecture, cross-cluster networking).
- Strong infrastructure-as-code mindset (Helm, Terraform, Ansible).
- Solid experience with monitoring and logging tools (Prometheus, Grafana, Loki).
Nice to have
- Experience with RKE2, Rancher, or similar platforms.
- Experience troubleshooting or supporting AI or GPU-based workloads.
- Familiarity with HAProxy, Cilium, or other Kubernetes ingress/networking tools.
Culture & Benefits
- Mission-driven company with competitive salary and stock options.
- 100% paid Medical, Dental, and Vision insurance.
- Flexible PTO and Paid Holidays.
- 401(k) and Parental Leave.
- Flexible Spending Account, Short Term Disability Insurance, Life and Voluntary Supplemental Insurance.
- Mental Health Benefits through Spring Health.
Похожие вакансии
1 день назад
Senior Engineer - HPC Operations (AI)
106 400 - 159 600$
2 дня назад
Senior Site Reliability Engineer (Kubernetes/Terraform)
150 000 - 200 000$
7 дней назад
AI Infrastructure Engineer
157 487 - 174 713$
2 дня назад
Senior Site Reliability Engineer (AI)
109 600 - 164 400$
2 дня назад
Staff DevOps Engineer (AI)
192 000 - 272 000$
12 часов назад
Senior DevOps Engineer (Kubernetes)
115 500 - 181 500$