Staff Software Engineer (Reliability)

212 000 - 286 200$

Формат работы

remote (только USA)

Тип работы

fulltime

Грейд

lead

Английский

Страна

US/Canada

Вакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Staff Software Engineer (Reliability): Owning the reliability of operating hirify.global Cloud end-to-end with an accent on defining and measuring reliability expectations, hardening systems through gamedays and chaos testing. Focus on building tooling and practices that make reliability visible and continuously improving across services and operational processes.

Location: Remote (United States or Canada)

Salary: $212,000 - $286,200

Company

hirify.global is an open source programming model that can simplify code, make applications more reliable, and help developers focus on the important things like delivering features faster.

What you will do

Own reliability outcomes for operating hirify.global Cloud end to end, partnering across engineering, infrastructure, and product to drive measurable improvements.
Define, implement, and evolve reliability targets and associated practices, including alerting thresholds, operational readiness criteria, and escalation paths.
Plan and run gamedays to validate incident response, operational procedures, and cross-team coordination under realistic failure scenarios.
Build and scale a chaos testing program that exercises failure modes safely and drives remediation work that reduces real risk.
Improve observability standards (metrics, logs, traces, dashboards) so reliability signals are consistent, actionable, and easy to audit.
Drive post-incident learning and corrective actions, ensuring fixes are durable and reduce recurrence risk over time.

Requirements

Strong computer science fundamentals, especially in distributed systems, concurrency, and performance.
Demonstrated ability to design and build complex systems that operate reliably under high load and partial failure.
Experience driving reliability improvements across multiple services.
Hands-on experience with at least one of: gamedays, chaos testing, load testing, or building reliability scorecards.
Strong judgment in ambiguous situations, including the ability to prioritize reliability work based on risk and impact.
Excellent communication skills.

Nice to have

Experience operating multi-tenant systems and designing protections against noisy-neighbor behaviors.
Deep expertise in observability (metrics design, tracing strategy, dashboard standards) and alert hygiene.
Experience building internal platforms or tooling that enables other teams to meet reliability standards.
Familiarity with workflow orchestration systems or durable execution platforms.
Open source contributions, especially in infrastructure or distributed systems.

Culture & Benefits

Unlimited PTO, 12 Holidays + 2 Floating Holidays
100% Premiums Coverage for Medical, Dental, and Vision
AD&D, LT & ST Disability, and Life Insurance
Empower 401K Plan
Additional Perks for Learning & Development, Lifestyle Spending, In-Home Office Setup, Professional Memberships, WFH Meals, Internet Stipend and more!
Calm App Subscription for Mental Health & Wellness

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Staff Software Engineer (Reliability)

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Nice to have

Culture & Benefits

Похожие вакансии

Site Reliability Engineer II

Staff Site Reliability Engineer (AI)

Staff DevOps Engineer (Cloud Infrastructure)

Senior Services Engineer (Orchestration)

Software Engineer, Reliability

Principal Software Developer (Networking) (SaaS)