Назад
Company hidden
2 дня назад

Observability Platform Engineer (AI)

Тип работы
fulltime
Английский
b2
Страна
UK
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Observability Platform Engineer (AI): Designing, deploying, and operating global-scale monitoring, logging, and tracing systems for GPU-powered AI infrastructure with an accent on scalability and automation. Focus on instrumenting distributed systems, automating observability via IaC, and improving incident detection and resolution.

Location: UK

Company

hirify.global is a GPU cloud provider engineered for AI, providing high-performance infrastructure for AI start-ups and large enterprise customers.

What you will do

  • Design and maintain global-scale observability platforms including monitoring, logging, tracing, and alerting.
  • Deploy and manage tools such as Prometheus, Grafana, Datadog, ELK/Opensearch, OpenTelemetry, and Jaeger.
  • Automate observability infrastructure using Infrastructure-as-Code and CI/CD pipelines.
  • Partner with SRE and Engineering teams to instrument applications and systems for telemetry.
  • Develop real-time dashboards and alerts to provide visibility into infrastructure health.
  • Document observability standards, tools, and processes.

Requirements

  • Strong experience in designing and operating observability platforms at scale.
  • Hands-on expertise with Prometheus, Grafana, Datadog, ELK/Opensearch, OpenTelemetry, or Jaeger.
  • Experience with cloud-native infrastructure including Kubernetes, containers, and service meshes.
  • Proficiency in scripting and automation using Python, Go, or Bash.
  • Knowledge of Infrastructure-as-Code tools like Terraform, Ansible, or Pulumi.
  • Must be located in the UK

Nice to have

  • Experience with AI/ML workload observability.
  • Familiarity with hyperscale datacenter environments.
  • Knowledge of AIOps and advanced telemetry analytics.
  • Exposure to sustainability monitoring and efficiency metrics.

Culture & Benefits

  • Culture of relentless innovation, ownership, and accountability.
  • Environment built on openness, transparency, and excellence.
  • Commitment to an inclusive, diverse, and equitable workplace.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →