Назад
Company hidden
22 часа назад

Senior Site Reliability Engineer (AI/ML Inference)

Формат работы
remote (только Europe/united_states)
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
UK, US, CR, Netherlands, Germany
Вакансия из списка Hirify.GlobalВакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior Site Reliability Engineer (AI/ML Inference): Owning the reliability, performance, and observability of a massive-scale AI inference platform with an accent on designing telemetry, optimizing Kubernetes autoscalers, and hardening distributed back-end systems. Focus on building self-healing systems, debugging performance from kernel to application layer, and ensuring flawless behavior under extreme load.

Location: Remote (Europe or United States)

Company

hirify.global is an AI cloud computing company serving the global AI economy, building tools and resources for customers to solve real-world AI/ML challenges.

What you will do

  • Own the reliability, performance, and observability of the entire inference stack.
  • Design and refine telemetry pipelines (metrics, logs, traces) for actionable insight.
  • Tune Kubernetes autoscalers and craft Terraform modules for cluster resilience.
  • Harden request-routing and retry logic to prevent user-facing failures.
  • Detect, isolate, and remediate problems using automation and runbooks.
  • Drive post-mortem culture to prevent incident recurrence.

Requirements

  • Deep fluency with Kubernetes, Prometheus, Grafana, and Terraform.
  • Proficiency in infrastructure-as-code principles and practices.
  • Comfortable scripting in Python or Bash.
  • Understanding of alert design and SLOs for high-throughput APIs.
  • Experience with distributed back-end failures in production environments.

Nice to have

  • Experience shepherding GPU-heavy workloads (e.g., with vLLM, Triton, Ray).
  • Background in MLOps or model-hosting platforms.

Culture & Benefits

  • Competitive salary and comprehensive benefits package.
  • Opportunities for professional growth within the company.
  • Flexible working arrangements.
  • Dynamic and collaborative work environment that values initiative and innovation.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник - загрузка...