Π‘ΡΠ΅Π΄Π½ΡΡ Π²Π°ΠΊΠ°Π½ΡΠΈΡ
ΠΠΏΠΈΡΠ°Π½ΠΈΠ΅ Π²Π°ΠΊΠ°Π½ΡΠΈΠΈ ΠΏΠΎΠ΄ΡΠΎΠ±Π½ΠΎΠ΅, Π½ΠΎ ΠΎΡΡΡΡΡΡΠ²ΠΈΠ΅ ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΠΈ ΠΎ ΠΊΠΎΠΌΠΏΠ°Π½ΠΈΠΈ ΠΈ ΠΏΡΠΎΠ·ΡΠ°ΡΠ½ΠΎΡΡΠΈ Π·Π°ΡΠΏΠ»Π°ΡΡ Π²ΡΠ·ΡΠ²Π°Π΅Ρ ΡΠΎΠΌΠ½Π΅Π½ΠΈΡ Π² ΠΎΠ±ΡΠ΅ΠΉ ΠΏΡΠΈΠ²Π»Π΅ΠΊΠ°ΡΠ΅Π»ΡΠ½ΠΎΡΡΠΈ.
ΠΠ»ΠΈΠΊΠ½ΠΈΡΠ΅ Π΄Π»Ρ ΠΏΠΎΠ΄ΡΠΎΠ±Π½ΠΎΠΉ ΠΈΠ½ΡΠΎΡΠΌΠ°ΡΠΈΠΈ
ΠΡΠ΅Π½ΠΊΠ° ΠΎΡ Hirify AI
ΠΠΏΠΈΡΠ°Π½ΠΈΠ΅ Π²Π°ΠΊΠ°Π½ΡΠΈΠΈ
π Hiring: Senior Site Reliability Engineer (SRE)
π Location: Bengaluru (Hybrid)
πΌ Experience: 6β10 Years
β οΈ Important:
β Only local Bengaluru candidates will be considered
β Must be available for face-to-face interview on short notice
______________
π Role Overview
We are looking for a hands-on Senior SRE with deep expertise in Observability, Kubernetes, and Cloud Platforms. This role focuses on building and operating highly reliable, scalable, and observable systems in GCP (preferred) and AWS environments.
______________
πΉ Key Responsibilities
Reliability & Operations
β’ Design and operate highly available Kubernetes-based systems
β’ Define & manage SLOs, SLIs, and Error Budgets
β’ Lead incident response, RCA, and blameless postmortems
β’ Improve platform reliability through automation
Observability (Core Focus)
β’ Build centralized observability platforms (metrics, logs, traces)
β’ Hands-on with Prometheus, Alertmanager, Grafana is Must
β’ Logging/Tracing using ELK / OpenSearch, Loki, OpenTelemetry
β’ Cloud-native monitoring (GCP Monitoring preferred)
β’ Define actionable, low-noise alerting standards
Cloud & Platform Engineering
β’ Infrastructure on GCP (GKE preferred) / AWS (EKS)
β’ Kubernetes cluster operations
β’ Helm deployments & Docker workloads
β’ Infra automation using Terraform / Ansible / Packer
Automation & Tooling
β’ Strong Python coding for reliability tooling
β’ Build internal tools for SLO tracking & incident workflows
β’ Integrate observability into CI/CD (Jenkins)
Leadership
β’ Mentor engineers
β’ Influence reliability architecture
β’ Collaborate with platform & cloud teams
______________
β
Mandatory Skills
SRE | Python (Coding) | Kubernetes | ELK | Prometheus | Grafana | AWS/GCP | Docker | Helm | Terraform | Linux | Jenkins CI/CD
β Nice to Have
Splunk | Datadog | Cribl | Vector | OpenTelemetry | Multi-cloud | Platform Security
______________
π
Project Highlights
β¨ Build a centralized observability platform
π Reduce MTTR using SLO-driven engineering
π¨ Lead production incident response
β‘ Optimize performance, scalability & cloud cost
______________
π© Interested?
Share the cv to
ΠΡΠ΄ΡΡΠ΅ ΠΎΡΡΠΎΡΠΎΠΆΠ½Ρ: Π΅ΡΠ»ΠΈ ΡΠ°Π±ΠΎΡΠΎΠ΄Π°ΡΠ΅Π»Ρ ΠΏΡΠΎΡΠΈΡ Π²ΠΎΠΉΡΠΈ Π² ΠΈΡ
ΡΠΈΡΡΠ΅ΠΌΡ, ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΡ iCloud/Google, ΠΏΡΠΈΡΠ»Π°ΡΡ ΠΊΠΎΠ΄/ΠΏΠ°ΡΠΎΠ»Ρ, Π·Π°ΠΏΡΡΡΠΈΡΡ ΠΊΠΎΠ΄/ΠΠ, Π½Π΅ Π΄Π΅Π»Π°ΠΉΡΠ΅ ΡΡΠΎΠ³ΠΎ - ΡΡΠΎ ΠΌΠΎΡΠ΅Π½Π½ΠΈΠΊΠΈ. ΠΠ±ΡΠ·Π°ΡΠ΅Π»ΡΠ½ΠΎ ΠΆΠΌΠΈΡΠ΅ "ΠΠΎΠΆΠ°Π»ΠΎΠ²Π°ΡΡΡΡ" ΠΈΠ»ΠΈ ΠΏΠΈΡΠΈΡΠ΅ Π² ΠΏΠΎΠ΄Π΄Π΅ΡΠΆΠΊΡ. ΠΠΎΠ΄ΡΠΎΠ±Π½Π΅Π΅ Π² Π³Π°ΠΉΠ΄Π΅ β
Π’Π΅ΠΊΡΡ Π²Π°ΠΊΠ°Π½ΡΠΈΠΈ Π²Π·ΡΡ Π±Π΅Π· ΠΈΠ·ΠΌΠ΅Π½Π΅Π½ΠΈΠΉ
ΠΡΡΠΎΡΠ½ΠΈΠΊ -