Senior Site Reliability Engineer

Формат работы

hybrid

Тип работы

fulltime

Грейд

senior

Английский

Страна

Europe, Israel, NA

Вакансия из Hirify Global, нашего списка международных tech-компаний

Мэтч & Сопровод

Покажет вашу совместимость и напишет письмо

Описание вакансии

Текст:

Location: Headquartered in Amsterdam with R&D hubs across Europe, North America, and Israel.

hirify.global is leading a new era in cloud computing to serve the global AI economy.

Overview

You will own the reliability, performance, and observability of the entire inference stack. You will design telemetry pipelines, tune Kubernetes autoscalers, craft Terraform modules, and harden request-routing and retry logic. The goal is scaling the platform smoothly while hitting aggressive cost and reliability targets.

What you will do

Design and refine telemetry pipelines to turn signal into actionable insight.
Tune Kubernetes autoscalers to optimize GPU efficiency.
Craft Terraform modules to build resilience into new clusters.
Harden request-routing and retry logic to prevent transient failures.
Automate incident detection, isolation, and remediation.
Drive post-mortem culture to prevent recurrence.

Requirements

Deep fluency with Kubernetes, Prometheus, Grafana, and Terraform.
Comfortable scripting in Python or Bash.
Understanding of alert design and SLOs for high-throughput APIs.
Experience with GPU-heavy workloads (vLLM, Triton, Ray, or similar).
Background in MLOps or model-hosting platforms.

Culture & Benefits

Competitive salary and comprehensive benefits package.
Opportunities for professional growth within hirify.global.
Hybrid working arrangements.
Dynamic and collaborative work environment.

Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Хайповые индустрии

По специализации

По регионам

По формату

По уровню