Назад
Company hidden
2 дня назад

Senior Application Reliability Engineer (Java)

Формат работы
hybrid
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
Poland, Bulgaria
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior Application Reliability Engineer (Java): Identifying and resolving reliability gaps in Java applications, implementing resilience patterns, and guiding development teams on stability improvements. Focus on troubleshooting production systems, instrumenting with OpenTelemetry, and driving cross-team initiatives for operational excellence.

Location: Remote, Hybrid or Onsite in Bulgaria or Poland

Company

hirify.global is an AI-first global tech company with 25+ years of engineering leadership, partnering with Fortune 500 clients on digital transformation and AI platforms.

What you will do

  • Review Java services to identify and address reliability gaps.
  • Introduce and implement resilience patterns like rate limiting, backpressure, and circuit breakers.
  • Troubleshoot production Java systems, perform root cause analysis, and conduct load testing.
  • Implement OpenTelemetry instrumentation for metrics, logs, and traces in Java services.
  • Collaborate with product owners and stakeholders to explain reliability concepts and lead discussions on SLOs/SLIs.
  • Drive cross-team initiatives to improve reliability, observability, and operational practices.

Requirements

  • Demonstrated expertise in SLOs, SLIs, error budgets, and tying them to release velocity and operational decision-making.
  • Experience creating or improving Service Level Contracts across distributed systems.
  • Ability to apply failure-mode analysis, chaos practices, and resilience engineering patterns.
  • Deep understanding of traffic management techniques: rate limiting, backpressure, load shedding, circuit breakers, concurrency limits.
  • Strong familiarity with container orchestration (K8s, ECS) from a production reliability perspective rather than simple deployment automation.
  • Hands-on experience implementing or improving distributed tracing, structured logging, and actionable metrics.
  • Experience running or participating in incident response, including on-call rotations and blameless post-mortems.
  • English: Intermediate+ required

Culture & Benefits

  • Flexible work options including in-office, hybrid, or remote.
  • Opportunities for international projects and ongoing learning with reimbursement.
  • Medical healthcare and a well-being program.
  • Recognition programs and team events.
  • Referral bonuses and top-tier equipment provision.
  • Culture that leads with trust, respect, open dialogue, creative freedom, and mentorship.

Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →