Назад
Company hidden
6 дней назад

Site Reliability Engineer Lead (DevOps)

Формат работы
remote (только India)
Тип работы
fulltime
Грейд
lead
Английский
b2
Страна
India
Вакансия из списка Hirify.GlobalВакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Site Reliability Engineer Lead (DevOps): Establishing and institutionalizing enterprise-grade SRE practices and observability for business-critical applications with an accent on defining SLOs, implementing monitoring stacks, and leading incident response. Focus on architecting resilient systems, driving operational excellence, and ensuring best-in-class uptime and customer experience for Gold and SME platforms.

Location: Remote from India (due to regulatory requirements like RBI, CERT-IN, DPDP Act)

Company

hirify.global is seeking an SRE Lead Engineer to establish enterprise-grade Site Reliability Engineering practice within IIFL Finance's platforms.

What you will do

  • Define and institutionalize the SRE charter, policies, and operating model across business-critical applications.
  • Design and implement service level objectives (SLOs), service level indicators (SLIs), and error budgets.
  • Architect and implement an enterprise observability stack across applications, databases, networks, and hybrid infrastructure.
  • Lead initiatives for capacity planning, chaos engineering, failover testing, and resilience validation.
  • Collaborate with application, DevSecOps, security, and infrastructure teams to embed SRE practices in the SDLC.
  • Build and lead a small team of SRE engineers.

Requirements

  • 7+ years of hands-on experience in hyper-scale services (e.g., AWS, AKS, Azure Monitor) and on-prem workloads.
  • Expert-level knowledge of logging, metrics (e.g., Datadog, AppDynamics, Prometheus/Grafana), tracing, and incident analytics at scale.
  • Proficiency in Python, PowerShell, Ansible, Terraform, and CI/CD integration.
  • Strong knowledge of microservices, containers (Kubernetes, Docker), message queues, and databases.
  • Proven ability to lead incident response, perform RCA, and design proactive reliability measures.
  • Understanding of Indian regulatory requirements (e.g., RBI, CERT-IN, DPDP Act) is required.

Culture & Benefits

  • Prioritize stability, resilience, and uptime while balancing innovation and delivery speed.
  • Embrace data-driven decision making, iterative enhancements, and blameless postmortems.
  • Work seamlessly with application, DevSecOps, and infrastructure teams to align goals.
  • Focus on customer-centric reliability, framing SLIs in terms of business impact.
  • Opportunity to define roadmap, mentor junior SREs, and drive enterprise-wide adoption of best practices.

Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →