Назад
Company hidden
обновлено 11 часов назад

Staff Site Reliability Engineer

119 000 - 170 000$
Формат работы
remote (только USA)/hybrid
Тип работы
fulltime
Грейд
lead
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Staff Site Reliability Engineer: Responsible for all aspects of the hirify.global production data center services, including servers, operating systems, storage, and supporting systems with an accent on the availability, latency, performance, efficiency, and scalability. Focus on incident response, troubleshooting, reducing toil, and driving systemic fixes through platform engineering.

Location: Remote (USA); San Jose, California, USA

Salary: $119,000 - $170,000 USD

Company

hirify.global accelerates digital transformation to ensure our customers can be more agile, efficient, resilient, and secure.

What you will do

  • Own the reliability of a large-scale cloud service by partnering with Engineering and Network teams to define requirements early and conduct operability reviews.
  • Develop and operate end-to-end observability and incident tooling to manage SLOs/error budgets, reduce noise, and improve system detection and diagnosis.
  • Participate in an on-call rotation to lead full-cycle incident response and perform deep cross-stack troubleshooting to drive permanent software fixes.
  • Build and maintain everything-as-code for fleet and service lifecycle, driving provisioning, configuration, release automation, and complex rollout/rollback workflows.
  • Continuously improve platform hygiene through consistent OS/app upgrades, dependency/vulnerability patching, capacity and performance tuning, and strict CI/CD validation prior to production rollouts.

Requirements

  • US Citizenship is required due to the nature of assigned customers.
  • 5+ years industry experience in software engineering, infrastructure software, and/or platform engineering.
  • Proficiency in at least one programming language (such as Python, Bash, or Go) with demonstrated ability to write production-quality code.
  • Strong Linux/Unix systems fundamentals and solid understanding of networking protocols and components.
  • Proven experience operating production services and ability to participate in on-call rotations and support occasional after-hours or weekend deployments.
  • Managing BSD in production, with a focus on driving systemic fixes through platform engineering.

Nice to have

  • Proven expertise in operating Kubernetes at scale.
  • Deep experience with the Prometheus/OpenTelemetry ecosystems, including instrumenting golden signals, defining SLOs, and performing alert tuning to ensure high-availability environments.

Culture & Benefits

  • Comprehensive and inclusive benefits to meet the diverse needs of employees and their families throughout their life stages.
  • Committed to building a team that reflects the communities served and the customers worked with.
  • Foster an inclusive environment that values all backgrounds and perspectives, emphasizing collaboration and belonging.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →