TL;DR
Lead Software Engineer - Site Reliability (SRE): Designing and maintaining resilient, high-scale distributed systems with an accent on observability, automation, and incident response. Focus on defining SLOs, implementing CI/CD pipelines, and driving performance engineering to ensure 99.99%+ system availability.
Company
hirify.global builds powerful, uncomplicated service software that delivers exceptional customer and employee experiences for businesses worldwide.
What you will do
- Design and implement tools to enhance system availability, latency, and scalability.
- Define SLIs/SLOs, manage error budgets, and lead performance engineering initiatives.
- Build and maintain automated monitoring, alerting, and remediation pipelines.
- Lead incident response, root cause analysis, and blameless postmortems.
- Champion observability practices across logs, metrics, and traces.
- Contribute to infrastructure architecture, automation, and reliability roadmaps.
Requirements
- 7–12 years of experience in SRE, DevOps, or Production Engineering.
- Strong proficiency in Linux system administration and advanced troubleshooting.
- Practical experience with Docker and Kubernetes for application management.
- Expertise in designing CI/CD pipelines and Infrastructure as Code (IaC) tools.
- Experience designing resilient distributed systems and disaster recovery strategies.
- Proficiency in utilizing AI-assisted development tools to enhance engineering velocity.
Culture & Benefits
- Focus on an inclusive environment welcoming colleagues of all backgrounds and identities.
- Commitment to equal opportunity and workplace diversity.
- Opportunity to work on enterprise-grade software serving over 72,000 global customers.
- Environment dedicated to professional growth, purpose, and impact.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →