Назад
Company hidden
5 дней назад

Software Engineer (Site Reliability Engineering)

Формат работы
onsite
Тип работы
fulltime
Грейд
principal
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Software Engineer (Site Reliability Engineering): Building and optimizing high-scale cloud services with an accent on platform uptime, performance, and health data visualization. Focus on transforming monitoring strategy into active, high-fidelity signals for real-time alerting and incident response, and integrating reliability testing into software development lifecycles.

Location: Onsite in San Francisco, Seattle, Palo Alto, or Bellevue, USA

Company

hirify.global is a technology organization managing high-level frameworks to measure platform uptime and performance, bridging reporting and individual engineering teams.

What you will do

  • Provide input into long-range platform requirements and operational guidelines, making health data actionable for service owners.
  • Analyze and understand service telemetry, driving continuous improvement of health signals.
  • Partner with internal engineering teams to integrate global availability standards into monitoring pipelines and automated alerting flows.
  • Identify and mitigate onboarding friction by leveraging automated test suites for streamlined reliability signals.
  • Serve as a technical subject matter expert for centralized infrastructure services (logging, monitoring, and data platforms).
  • Quarterback the integration of failure signals into standard engineering workflows, ensuring automated work items and proactive investigations.

Requirements

  • A related technical degree.
  • 5+ years of proven experience in production environments (software engineer, systems engineer, service owner, or lead developer).
  • Fluency in Java or a similar object-oriented language (Python, C++, etc.).
  • Deep understanding of telemetry systems and experience building or managing production monitoring and alerting frameworks.
  • Experience using Linux environments and the ability to navigate complex, distributed system architectures.
  • Familiarity with core web technologies: HTTP, JSON, REST, and XML.

Nice to have

  • Previous experience in a Service Owner or Technical Lead role within a high-scale, multi-tenant cloud environment.
  • Strong background in Site Reliability Engineering (SRE) principles and industry-standard availability best practices.
  • Experience with automated testing frameworks (e.g., Selenium, Integration testing, or Chaos Engineering).
  • Log parsing and data analysis experience using platforms such as Splunk or ELK.
  • Experience with SQL and relational databases (PostgreSQL, Oracle, etc.).

Culture & Benefits

  • Be part of the Availability Standards team, influencing platform uptime and performance.
  • Follow a consultative engineering approach, partnering with service owners.
  • Advocate for the customer and influence the product roadmap by ensuring world-class availability.
  • Work within a team focused on maintaining world-class availability.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник - загрузка...