Staff Site Reliability Engineer

Формат работы

remote (только USA)

Тип работы

fulltime

Грейд

senior

Английский

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Staff Site Reliability Engineer (AWS/Kubernetes): Establish and evolve SRE best practices across the organization, including reliability principles, error budgets, incident response, and observability strategy with an accent on SLIs/SLOs, alerting, dashboards, and automation. Focus on designing software-driven infrastructure solutions, leading large initiatives, and improving platform resilience, scalability, and developer workflows.

Location: Remote

Company

hirify.global is the fastest growing healthcare technology company building products to make prescriptions accessible and affordable, including BlinkRx pharma-to-patient cloud and Quick Save for better access to medications.

What you will do

Establish and evolve SRE best practices, including reliability principles, error budgets, incident response, postmortems, and operational readiness.
Define and drive observability strategy for system health with SLIs/SLOs, alerting, dashboards, and service indicators.
Design and implement software-driven infrastructure solutions to automate processes and reduce toil.
Act as technical leader, set priorities, and influence decisions across cloud infrastructure, reliability tooling, and platform architecture.
Own large ambiguous initiatives from concept to delivery, aligning stakeholders in engineering, security, and product.
Improve platform resilience, scalability, performance, and compliance; identify risks and lead upgrades.
Partner with teams to enhance developer workflows, tooling, and operational maturity; provide mentorship and code reviews.
Lead incident response, escalation, postmortems, and knowledge sharing through documentation.

Requirements

Bachelor’s or Master’s in Computer Science or equivalent; 7+ years in SRE, infrastructure, or platform engineering at scale.
Expert troubleshooting across full stack: application, kernel, network; strong Linux and OS fundamentals.
Advanced networking: load balancing, proxies, DNS, TCP/IP, NAT, service communication.
Experience in Python, Go, Bash; automating operations; building internal tools.
Deep cloud experience (AWS preferred, GCP/Azure ok), Kubernetes (EKS, Helm), observability systems, containers, microservices.
IaC with Terraform, Pulumi, CloudFormation, or Ansible; holistic infrastructure design for cost, reliability, security.

Culture & Benefits

Highly collaborative team of builders and operators inventing new ways in healthcare innovation.
Impact millions of patients at intersection of healthcare and finances; build generational company.
Relentlessly learning, curious, aggressively collaborative cross-functional environment.
Equal opportunity employer valuing diversity.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →