Назад
Company hidden
7 дней назад

Staff Reliability Engineer (Fintech)

217 000 - 255 000$
Формат работы
hybrid
Тип работы
fulltime
Грейд
principal
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Staff Reliability Engineer (Fintech): Leading incident detection, coordination, and mitigation for hirify.global's production incidents, defining incident response processes and tooling at scale. Focus on driving long-term reliability and observability strategy, designing failure mitigation strategies, and improving monitoring and alerting across hundreds of services.

Location: New York, NY (hybrid, in-person attendance expected at least 3 days per week)

Salary: $217,000 - $255,000 USD (Base pay, Zone 1)

Company

hirify.global is building an elite team applying frontier technologies to the world’s biggest financial problems, with a mission to democratize finance for all.

What you will do

  • Serve as a senior technical leader driving the long-term reliability and observability strategy across hirify.global’s infrastructure.
  • Lead incident mitigation efforts by coordinating service owners and facilitating time-sensitive decisions.
  • Develop and maintain incident management processes and procedures to ensure timely resolution and minimize customer impact.
  • Own incident discovery at the company level by defining and maintaining global dashboards and alerts.
  • Drive post-incident governance and learning, defining standards for postmortems and follow-up tracking.
  • Design and implement next-generation failure mitigation strategies that avoid full-region or full-datacenter failovers.

Requirements

  • 8+ years of software engineering experience, including significant experience operating production systems.
  • 4+ years focused on reliability engineering, infrastructure, distributed systems, or production operations.
  • Hands-on experience serving in incident leadership roles (e.g., IMOC, incident commander, primary oncall).
  • Strong communication and cross-functional collaboration skills, especially during high-severity incidents.
  • Deep knowledge of systems reliability, observability frameworks, and fault-tolerant architecture design.
  • Experience with multi-region or multi-cluster architectures, capacity planning, and failover strategies.
  • Familiarity with modern observability stacks (e.g., OpenTelemetry, Prometheus, Grafana).

Culture & Benefits

  • Challenging, high-impact work with performance-driven compensation, bonus programs, equity ownership, and 401(k) matching.
  • 100% paid health insurance for employees with 90% coverage for dependents.
  • Lifestyle wallet – a highly flexible benefits spending account for wellness, learning, and more.
  • Employer-paid life & disability insurance, fertility benefits, and mental health benefits.
  • Time off to recharge including company holidays, paid time off, sick time, and parental leave.
  • Exceptional office experience with catered meals, events, and comfortable workspaces.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник - загрузка...