Назад
Company hidden
7 дней назад

Staff+ Software Engineer, Observability

325 000 - 390 000GBP
Формат работы
hybrid
Тип работы
fulltime
Грейд
middle/senior
Английский
b2
Страна
UK
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Staff+ Software Engineer (Observability): Building and maintaining scalable telemetry ingest and storage pipelines across hirify.global’s multi-cluster infrastructure with an accent on high-throughput data pipelines and cost-efficient columnar storage. Focus on reducing mean time to detection and resolution by building cross-signal correlation and AI-assisted diagnostic tooling.

Location: Expected to be in one of the offices at least 25% of the time.

Salary: £325,000 - £390,000 GBP

Company

hirify.global’s mission is to create reliable, interpretable, and steerable AI systems.

What you will do

  • Design and build scalable telemetry ingest and storage pipelines for metrics, logs, traces, and error data.
  • Own and evolve core observability platforms, driving migrations and architectural improvements.
  • Build instrumentation libraries, SDKs, and integrations for engineering teams to emit high-quality telemetry.
  • Drive alerting and SLO infrastructure to define, monitor, and respond to reliability targets.
  • Reduce mean time to detection and resolution by building cross-signal correlation and AI-assisted diagnostic tooling.
  • Partner with Research, Inference, Product, and Infrastructure teams to ensure observability solutions.

Requirements

  • 10+ years of experience building and operating large-scale observability or monitoring infrastructure.
  • Deep experience with at least one observability signal area (metrics, logging, tracing, or error analytics).
  • Understand high-throughput data pipelines, columnar storage engines, and the tradeoffs involved in ingesting and querying telemetry data at scale.
  • Experience operating or building on top of observability platforms such as Prometheus, Grafana, ClickHouse, OpenTelemetry, or similar systems.
  • Proficiency in at least one of Python, Rust, or Go.
  • Excellent communication skills and enjoy partnering with internal teams to improve their operational visibility and incident response capabilities.

Nice to have

  • Experience operating metrics systems at very high cardinality (hundreds of millions of active time series or more).
  • Experience with log storage migrations or operating columnar databases (ClickHouse, BigQuery, or similar) for analytics workloads.
  • Experience with OpenTelemetry instrumentation, collector pipelines, and tail-based sampling strategies.
  • Experience building or operating alerting platforms, on-call tooling, or SLO frameworks at scale.
  • Experience with Kubernetes-native monitoring, eBPF-based observability, or continuous profiling.
  • Interest in applying AI/LLMs to operational workflows such as automated root cause analysis, anomaly detection, or intelligent alerting.

Culture & Benefits

  • Competitive compensation and benefits.
  • Optional equity donation matching.
  • Generous vacation and parental leave.
  • Flexible working hours.
  • Lovely office space in which to collaborate with colleagues.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник - загрузка...