TL;DR
Staff Reliability Engineer (Fintech): Leading incident detection, coordination, and mitigation for hirify.global's production incidents, defining incident response processes and tooling at scale. Focus on driving long-term reliability and observability strategy, designing failure mitigation strategies, and improving monitoring and alerting across hundreds of services.
Location: New York, NY (hybrid, in-person attendance expected at least 3 days per week)
Salary: $217,000 - $255,000 USD (Base pay, Zone 1)
Company
hirify.global is building an elite team applying frontier technologies to the world’s biggest financial problems, with a mission to democratize finance for all.
What you will do
- Serve as a senior technical leader driving the long-term reliability and observability strategy across hirify.global’s infrastructure.
- Lead incident mitigation efforts by coordinating service owners and facilitating time-sensitive decisions.
- Develop and maintain incident management processes and procedures to ensure timely resolution and minimize customer impact.
- Own incident discovery at the company level by defining and maintaining global dashboards and alerts.
- Drive post-incident governance and learning, defining standards for postmortems and follow-up tracking.
- Design and implement next-generation failure mitigation strategies that avoid full-region or full-datacenter failovers.
Requirements
- 8+ years of software engineering experience, including significant experience operating production systems.
- 4+ years focused on reliability engineering, infrastructure, distributed systems, or production operations.
- Hands-on experience serving in incident leadership roles (e.g., IMOC, incident commander, primary oncall).
- Strong communication and cross-functional collaboration skills, especially during high-severity incidents.
- Deep knowledge of systems reliability, observability frameworks, and fault-tolerant architecture design.
- Experience with multi-region or multi-cluster architectures, capacity planning, and failover strategies.
- Familiarity with modern observability stacks (e.g., OpenTelemetry, Prometheus, Grafana).
Culture & Benefits
- Challenging, high-impact work with performance-driven compensation, bonus programs, equity ownership, and 401(k) matching.
- 100% paid health insurance for employees with 90% coverage for dependents.
- Lifestyle wallet – a highly flexible benefits spending account for wellness, learning, and more.
- Employer-paid life & disability insurance, fertility benefits, and mental health benefits.
- Time off to recharge including company holidays, paid time off, sick time, and parental leave.
- Exceptional office experience with catered meals, events, and comfortable workspaces.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →