Lead Software Engineer (Site Reliability)

Формат работы

onsite

Тип работы

fulltime

Грейд

lead

Английский

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Lead Software Engineer (Site Reliability): Designing and implementing resilient, scalable, and observable systems to ensure high uptime and performance for hirify.global' software with an accent on automating recovery, managing error budgets, and driving performance engineering. Focus on leading incident response, championing observability, and contributing to infrastructure architecture.

Compensation: INR0 - INR0 - yearly

Company

hirify.global builds uncomplicated service software that delivers exceptional customer and employee experiences for over 72,000 companies worldwide.

What you will do

Design and implement tools to improve availability, latency, scalability, and system health.
Define SLIs/SLOs, manage error budgets, and lead performance engineering efforts.
Build and maintain automated monitoring, alerting, and remediation pipelines.
Collaborate with engineering teams to improve reliability by design and advocate for SRE best practices.
Lead incident response, root cause analysis, and blameless postmortems.
Contribute to infrastructure architecture, automation, and reliability roadmaps.

Requirements

7–12 years of experience in SRE, DevOps, or Production Engineering roles.
Proficiency in coding and in-depth knowledge of Linux for system administration and troubleshooting.
Practical experience with Docker and Kubernetes for application deployment and management.
Experience designing, implementing, and maintaining Continuous Integration and Continuous Delivery (CI/CD) pipelines.
Understanding of security best practices and compliance in infrastructure.
Expertise in designing and implementing highly available, scalable, and resilient distributed systems.
Proficiency in Infrastructure as Code (IaC) tools and automating infrastructure provisioning and management.
Deep knowledge and practical experience with various Disaster Recovery (DR) and High Availability (HA) strategies.
Experience implementing and utilizing monitoring, logging, and tracing tools for system health.
Excellent analytical and diagnostic skills for resolving complex system issues.

Nice to have

Degree in Computer Science, Engineering, or a related field.
Experience building and scaling services in production with high uptime targets (99.99%+).
Clear track record of reducing incident frequency and improving response metrics (MTTD/MTTR).

Culture & Benefits

Fostered environment for employees to find their true potential, purpose, and passion.
Commitment to providing equal opportunity and diversity in the workplace.
Opportunity to build with a fresh vision and make a real impact.

Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Lead Software Engineer (Site Reliability)

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Nice to have

Culture & Benefits

Похожие вакансии

Site Reliability Engineer (DevOps/AI)

Senior Python Developer (Platform/Security)

Senior DevOps Engineer

Senior Devops Engineer

Senior SRE/DevOps Engineer (Cloud/SaaS)

Site Reliability Engineer