Назад
Company hidden
12 часов назад

Senior DevOps Engineer (AI)

Тип работы
fulltime
Грейд
senior
Английский
b2
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior DevOps Engineer (AI/Infrastructure): Shaping and scaling a GenAI platform across 40+ Kubernetes clusters in GCP and AWS with an accent on GPU workloads and multi-tenant scheduling. Focus on optimizing production-grade AI training and inference, managing cost and reliability tradeoffs, and improving developer experience.

Company

hirify.global Labs is pioneering the development of Foundation Models and AI Systems for enterprises, accelerating the adoption of Generative AI in production.

What you will do

  • Design, operate, and scale multi-cluster Kubernetes environments across GCP and AWS.
  • Manage multi-tenant GPU scheduling for training and inference at scale, focusing on capacity, utilization, and cost.
  • Lead the developer platform, providing self-service tools and automation for R&D.
  • Optimize system cost, performance, and reliability through monitoring and capacity planning.
  • Establish security and governance standards for RBAC, IAM, and cloud compliance across the infrastructure.
  • Drive the adoption of GitOps and Infrastructure as Code using Terraform, Helm, Crossplane, and ArgoCD.

Requirements

  • 7+ years of experience in DevOps or SRE.
  • Deep expertise in large-scale, multi-cluster, enterprise-grade Kubernetes environments on GCP and/or AWS.
  • Hands-on experience operating production-scale GPU workloads and multi-tenant scheduling.
  • Strong background in Infrastructure as Code (Terraform, Helm) and GitOps principles (ArgoCD, Crossplane, FluxCD).
  • Proficiency with observability and monitoring tools such as Prometheus, Grafana, Datadog, and OpenTelemetry.

Nice to have

  • Experience with self-hosted on-prem deployments and managed private VPC (Bring Your Own Cloud) setups.
  • Experience designing and managing CRDs and custom controllers.
  • DevSecOps experience including security automation and compliance frameworks.
  • Experience operating GenAI or large-scale SaaS platforms.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →