Назад
Company hidden
4 дня назад

Lead Software Engineer - Site Reliability (SRE)

Тип работы
fulltime
Грейд
lead
Английский
b2
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Lead Software Engineer - Site Reliability (SRE): Designing for resilience, automating recovery, and ensuring system stability and observability at scale with an accent on SLIs/SLOs and performance engineering. Focus on building automated monitoring pipelines, leading incident response, and implementing high-availability distributed systems.

Company

hirify.global builds uncomplicated service software that delivers exceptional employee and customer experiences through enterprise-grade CX and IT solutions.

What you will do

  • Design and implement tools to improve system availability, latency, scalability, and overall health.
  • Define SLIs/SLOs, manage error budgets, and drive performance engineering efforts.
  • Build and maintain automated monitoring, alerting, and remediation pipelines.
  • Lead incident response, perform root cause analysis, and drive blameless postmortems.
  • Champion observability across services using logs, metrics, and traces.
  • Contribute to infrastructure architecture, automation, and reliability roadmaps.

Requirements

  • 7–12 years of experience in SRE, DevOps, or Production Engineering roles.
  • Strong coding proficiency and in-depth Linux expertise for advanced troubleshooting.
  • Practical experience with Docker and Kubernetes for application deployment and orchestration.
  • Experience designing and maintaining Continuous Integration and Continuous Delivery (CI/CD) pipelines.
  • Proficiency in Infrastructure as Code (IaC) tools and infrastructure automation.
  • Deep knowledge of Disaster Recovery (DR) and High Availability (HA) strategies for distributed systems.

Nice to have

  • Degree in Computer Science, Engineering, or a related field.
  • Experience scaling services in production with high uptime targets (99.99%+).
  • Proven track record of reducing incident frequency and improving MTTD/MTTR metrics.

Culture & Benefits

  • Inclusive environment welcoming colleagues of all backgrounds, genders, and orientations.
  • Commitment to equal opportunity and workplace diversity.
  • People-first approach to AI and a culture of reducing complexity.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →