Назад
Company hidden
1 день назад

Observability Engineer (AI)

Формат работы
onsite
Тип работы
fulltime
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Observability Engineer (AI): Owning and evolving the observability and monitoring platform for AI infrastructure with an accent on designing and maintaining high-quality metrics pipelines, dashboards, and actionable alerts. Focus on establishing observability standards across services, partnering with engineering teams for instrumentation, and supporting incident response.

Location: Onsite in Las Vegas, Nevada, USA

Company

hirify.global Cloud builds seamless, secure, reliable, and resilient AI infrastructure at scale, empowering builders and supporting AI innovation.

What you will do

  • Own and evolve the observability and monitoring platform, with Grafana and Prometheus at its core.
  • Design, build, and maintain high-quality metrics pipelines.
  • Create clear, actionable Grafana dashboards and define meaningful, low-noise alerts.
  • Establish and enforce observability standards across services (metrics, logs, traces).
  • Partner with engineering teams to instrument applications correctly.
  • Support incident response by helping teams understand issues quickly.

Requirements

  • Strong hands-on experience with Grafana and Prometheus.
  • Deep understanding of metrics-based observability.
  • Experience designing monitoring and alerting systems at scale.
  • Strong knowledge of alerting best practices (burn rates, SLO-based alerts).
  • Experience working with distributed systems and cloud or Kubernetes environments.
  • Ability to reason about system behavior using telemetry.

Nice to have

  • Experience with OpenTelemetry.
  • Familiarity with logs and traces (Loki, Tempo, Jaeger).
  • Kubernetes observability experience.
  • Infrastructure-as-Code experience (Terraform, Helm).

Culture & Benefits

  • Mission-driven company with competitive salary and stock options.
  • 100% paid Medical, Dental, and Vision insurance.
  • Flexible Spending Account and 401(k).
  • Flexible PTO, Paid Holidays, and Parental Leave.
  • Mental Health Benefits through Spring Health.
  • Opportunity to build the future of AI infrastructure at Exascale.

Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →