Назад
Company hidden
обновлено 1 день назад

Kubernetes Platform Engineer (AI)

Формат работы
onsite
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior Kubernetes Platform Engineer (AI): Building and optimizing seamless, secure, reliable, and resilient AI infrastructure at scale, with an accent on maintaining stability and reliability of bare-metal Kubernetes infrastructure. Focus on troubleshooting, incident response, and day-to-day cluster operations across multi-tenant workloads.

Location: Las Vegas, Nevada, USA (Onsite)

Company

hirify.global Cloud is dedicated to building secure, reliable, and resilient AI infrastructure at scale to empower innovation.

What you will do

  • Own and troubleshoot operational issues within Kubernetes environments.
  • Maintain and monitor core services like Cilium, HAProxy, and Prometheus.
  • Ensure uptime, performance, and reliability of multi-tenant clusters.
  • Assist with Ingress/Egress connectivity and network debugging.
  • Support internal and customer teams in secure, isolated VPC environments.
  • Collaborate with senior engineers on automation and cluster lifecycle improvements.

Requirements

  • Bachelor of Science in Computer Science, Computer Engineering, or a related technical field, or equivalent practical experience.
  • 5+ years experience in DevOps, SRE, or Linux infrastructure roles.
  • 4+ years of hands-on experience with Kubernetes in production.
  • 3+ years designing and operating multi-tenant Kubernetes platforms at CSP or hyperscaler scale.
  • Proven experience implementing production-grade cluster authentication (OIDC/SSO integration, RBAC policies) and advanced network design (CNI selection/configuration, network policies, service mesh architecture, cross-cluster networking).
  • Strong infrastructure-as-code mindset (Helm, Terraform, Ansible).
  • Solid experience with monitoring and logging tools (Prometheus, Grafana, Loki).

Nice to have

  • Experience with RKE2, Rancher, or similar platforms.
  • Experience troubleshooting or supporting AI or GPU-based workloads.
  • Familiarity with HAProxy, Cilium, or other Kubernetes ingress/networking tools.

Culture & Benefits

  • Mission-driven company with competitive salary and stock options.
  • 100% paid Medical, Dental, and Vision insurance.
  • Flexible PTO and Paid Holidays.
  • 401(k) and Parental Leave.
  • Flexible Spending Account, Short Term Disability Insurance, Life and Voluntary Supplemental Insurance.
  • Mental Health Benefits through Spring Health.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник - загрузка...