Назад
Company hidden
11 часов назад

Principal Platform Engineer (GCP/ML)

Формат работы
remote (только Europe)
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
Ukraine/Poland/Romania +3 еще
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Principal Platform Engineer (GCP/ML): Architect and lead infrastructure strategy for next-generation Production ML platform on Google Cloud with an accent on elastic scaling, security, and resilience for high-performance machine learning workloads. Focus on building paved road for engineers, automating model deployment to complex networking, CI/CD pipelines, and implementing comprehensive ML observability.

Location: Remote Ukraine

Company

hirify.global helps customers monitor, manage, and protect against risks to their identities and personal information in the digital world, backed by WndrCo, Warburg Pincus, and General Catalyst.

What you will do

  • Design, deploy, and maintain elastic scaling GCP infrastructure and Kubernetes for ML workloads.
  • Build and maintain CI/CD pipelines for training, testing, and deploying ML models using Jenkins, GitHub Actions, or Airflow.
  • Implement observability for model drift, accuracy, latency, performance, and system health including non-ML workloads.
  • Deploy monitoring tools empowering teams and participate in on-call rotation for compliance like SOC.
  • Collaborate with data, ML, backend, and frontend engineers for smooth production operations.

Requirements

  • 8-10+ years in DevOps/Platform Engineering, with 2+ years operating production ML workloads
  • Deep hands-on GCP (VPC-SC, IAM, Org Policies) and GKE (topology, Helm, Kustomize, ArgoCD)
  • High proficiency in Istio (VirtualServices, mTLS) and Kong API Gateway
  • Expert Terraform with Atlantis/GitOps workflow
  • Experience with secrets/identity (Auth0, Dex, ESO, SOPS), Airflow, ML-serving (Triton, vLLM, MLflow)
  • Manage Cloud SQL (PostgreSQL), BigQuery, Elasticsearch, ClickHouse
  • Upper-intermediate spoken and written English

Nice to have

  • ML observability experience with model accuracy and drift detection
  • Ansible for cluster bootstrap/recovery
  • CKA/CKS or GCP Professional certifications
  • Loki, Grafana, or large-scale ClickHouse

Culture & Benefits

  • High-trust, outcome-focused team solving challenging ML problems quickly
  • Scrappy, nimble organization valuing individual contributions and impact
  • Fast-paced growth environment to learn new technologies, products, and markets
  • Inclusive community committed to no discrimination or barriers to success

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →