Назад
Company hidden
8 часов назад

Senior Site Reliability Engineer

129 098 - 189 343$
Формат работы
hybrid
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US/Canada
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior Site Reliability Engineer (SRE): Ensuring the IFIaaS SaaS platform is reliable, available, and performant with an accent on SLO/SLI ownership, end-to-end observability, and incident response across distributed on-prem/hybrid environments. Focus on designing HA/DR failover, automating runbooks and provisioning, and maintaining secure, compliant operations while reducing MTTA/MTTR and toil.

Location: Hybrid — three in-office days per week in Minneapolis, Ottawa, Colorado, or Dallas (primary posting location: Shakopee, MN).

Salary: $129,098-$189,343 per year (US).

Company

hirify.global provides identity-centric security solutions, enabling trusted identities, payments, and data protection.

What you will do

  • Own SLOs/SLIs for availability (99.9%), latency, error rate, and quality of service across microservices.
  • Design and operate observability (metrics, logs, traces, synthetic checks, and real-user monitoring) and instrument services with structured logs and trace context.
  • Build health probes, SLA monitors, and on-call/monitoring tooling (e.g., Splunk on-call, Prometheus, Datadog) and use metrics to detect and diagnose issues.
  • Lead incident response (triage, communications, coordination, mitigation) and run blameless postmortems with actionable follow-ups.
  • Maintain and improve runbooks, escalation paths, paging policies, and MTTA/MTTR reduction programs; implement war room protocols during incidents.
  • Automate provisioning and configuration drift detection/correction; manage patching, backups/restores (RPO/RTO), and compliance evidence for PCI-DSS/PCI-CP and SOC 2/ISO 27001.

Requirements

  • 5+ years of experience in SRE, DevOps, or software engineering supporting distributed, production-grade environments, including troubleshooting microservices and Windows/VMware systems in on-prem/hybrid infrastructure.
  • Hands-on automation and observability experience, including Terraform/Ansible/DSC, CI/CD, and enterprise monitoring/logs/metrics/tracing tools (e.g., Datadog, Prometheus, Splunk).
  • Infrastructure automation proficiency (e.g., Terraform, Ansible, Jenkins, Octopus, PowerShell DSC).
  • Proficiency in VMware, Windows Server administration, networking fundamentals, and system-level performance analysis.
  • Hands-on experience operating and troubleshooting enterprise microservices, APIs, and distributed application stacks in on-prem/hybrid environments.
  • Must provide after-hours production support on a rotational basis to ensure 24/7/365 availability.

Nice to have

  • Experience operating in compliance-sensitive environments (PCI-DSS, PCI-CP, SOC 2) with strong integrity and accountability.
  • Leadership behaviors and communication skills, including leading through example and driving operational excellence.

Culture & Benefits

  • Hybrid flexibility with three in-office days per week; distributed workforce.
  • Comprehensive US health and well-being programs, including medical, vision, dental, and 401(k) matching.
  • Paid personal time off plus 12 paid holidays, parental leave, life/disability insurance, and education reimbursement (eligibility applies).
  • Discretionary annual incentive plan eligibility.
  • Focus on operational excellence, blameless postmortems, and continuous improvement.

Hiring process

  • Recruiter screen followed by interviews to assess SRE/observability/incident response experience and operational practices.
  • Compensation and eligibility details discussed with the recruiter.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →