Staff Reliability Engineer (Fintech)

217 000 - 255 000$

Формат работы

hybrid

Тип работы

fulltime

Грейд

principal

Английский

Страна

Описание вакансии

Текст:

TL;DR

Staff Reliability Engineer (Fintech): Leading incident detection, coordination, and mitigation for hirify.global's production incidents, defining incident response processes and tooling at scale. Focus on driving long-term reliability and observability strategy, designing failure mitigation strategies, and improving monitoring and alerting across hundreds of services.

Location: New York, NY (hybrid, in-person attendance expected at least 3 days per week)

Salary: $217,000 - $255,000 USD (Base pay, Zone 1)

Company

hirify.global is building an elite team applying frontier technologies to the world’s biggest financial problems, with a mission to democratize finance for all.

What you will do

Serve as a senior technical leader driving the long-term reliability and observability strategy across hirify.global’s infrastructure.
Lead incident mitigation efforts by coordinating service owners and facilitating time-sensitive decisions.
Develop and maintain incident management processes and procedures to ensure timely resolution and minimize customer impact.
Own incident discovery at the company level by defining and maintaining global dashboards and alerts.
Drive post-incident governance and learning, defining standards for postmortems and follow-up tracking.
Design and implement next-generation failure mitigation strategies that avoid full-region or full-datacenter failovers.

Requirements

8+ years of software engineering experience, including significant experience operating production systems.
4+ years focused on reliability engineering, infrastructure, distributed systems, or production operations.
Hands-on experience serving in incident leadership roles (e.g., IMOC, incident commander, primary oncall).
Strong communication and cross-functional collaboration skills, especially during high-severity incidents.
Deep knowledge of systems reliability, observability frameworks, and fault-tolerant architecture design.
Experience with multi-region or multi-cluster architectures, capacity planning, and failover strategies.
Familiarity with modern observability stacks (e.g., OpenTelemetry, Prometheus, Grafana).

Culture & Benefits

Challenging, high-impact work with performance-driven compensation, bonus programs, equity ownership, and 401(k) matching.
100% paid health insurance for employees with 90% coverage for dependents.
Lifestyle wallet – a highly flexible benefits spending account for wellness, learning, and more.
Employer-paid life & disability insurance, fertility benefits, and mental health benefits.
Time off to recharge including company holidays, paid time off, sick time, and parental leave.
Exceptional office experience with catered meals, events, and comfortable workspaces.