Эта вакансия в архиве

Посмотреть похожие вакансии ↓
Company hidden
обновлено 22 дня назад

Site Reliability Engineer Lead (DevOps)

Формат работы
remote (только India)
Тип работы
fulltime
Грейд
lead
Английский
b2
Страна
India

Описание вакансии

Текст:
/

TL;DR

Site Reliability Engineer Lead (DevOps): Establishing and institutionalizing enterprise-grade SRE practices and observability for business-critical applications with an accent on defining SLOs, implementing monitoring stacks, and leading incident response. Focus on architecting resilient systems, driving operational excellence, and ensuring best-in-class uptime and customer experience for Gold and SME platforms.

Location: Remote from India (due to regulatory requirements like RBI, CERT-IN, DPDP Act)

Company

hirify.global is seeking an SRE Lead Engineer to establish enterprise-grade Site Reliability Engineering practice within IIFL Finance's platforms.

What you will do

  • Define and institutionalize the SRE charter, policies, and operating model across business-critical applications.
  • Design and implement service level objectives (SLOs), service level indicators (SLIs), and error budgets.
  • Architect and implement an enterprise observability stack across applications, databases, networks, and hybrid infrastructure.
  • Lead initiatives for capacity planning, chaos engineering, failover testing, and resilience validation.
  • Collaborate with application, DevSecOps, security, and infrastructure teams to embed SRE practices in the SDLC.
  • Build and lead a small team of SRE engineers.

Requirements

  • 7+ years of hands-on experience in hyper-scale services (e.g., AWS, AKS, Azure Monitor) and on-prem workloads.
  • Expert-level knowledge of logging, metrics (e.g., Datadog, AppDynamics, Prometheus/Grafana), tracing, and incident analytics at scale.
  • Proficiency in Python, PowerShell, Ansible, Terraform, and CI/CD integration.
  • Strong knowledge of microservices, containers (Kubernetes, Docker), message queues, and databases.
  • Proven ability to lead incident response, perform RCA, and design proactive reliability measures.
  • Understanding of Indian regulatory requirements (e.g., RBI, CERT-IN, DPDP Act) is required.

Culture & Benefits

  • Prioritize stability, resilience, and uptime while balancing innovation and delivery speed.
  • Embrace data-driven decision making, iterative enhancements, and blameless postmortems.
  • Work seamlessly with application, DevSecOps, and infrastructure teams to align goals.
  • Focus on customer-centric reliability, framing SLIs in terms of business impact.
  • Opportunity to define roadmap, mentor junior SREs, and drive enterprise-wide adoption of best practices.