Назад
Company hidden
4 дня назад

Lead Software Engineer (Site Reliability)

Π€ΠΎΡ€ΠΌΠ°Ρ‚ Ρ€Π°Π±ΠΎΡ‚Ρ‹
onsite
Π’ΠΈΠΏ Ρ€Π°Π±ΠΎΡ‚Ρ‹
fulltime
Π“Ρ€Π΅ΠΉΠ΄
lead
Английский
b2
Вакансия ΠΈΠ· списка Hirify.GlobalВакансия ΠΈΠ· Hirify Global, списка ΠΌΠ΅ΠΆΠ΄ΡƒΠ½Π°Ρ€ΠΎΠ΄Π½Ρ‹Ρ… tech-ΠΊΠΎΠΌΠΏΠ°Π½ΠΈΠΉ
Для мэтча ΠΈ ΠΎΡ‚ΠΊΠ»ΠΈΠΊΠ° Π½ΡƒΠΆΠ΅Π½ Plus

ΠœΡΡ‚Ρ‡ & Π‘ΠΎΠΏΡ€ΠΎΠ²ΠΎΠ΄

Для мэтча с этой вакансиСй Π½ΡƒΠΆΠ΅Π½ Plus

ОписаниС вакансии

ВСкст:
/

TL;DR

Lead Software Engineer (Site Reliability): Designing and implementing resilient, scalable, and observable systems to ensure high uptime and performance for hirify.global' software with an accent on automating recovery, managing error budgets, and driving performance engineering. Focus on leading incident response, championing observability, and contributing to infrastructure architecture.

Compensation: INR0 - INR0 - yearly

Company

hirify.global builds uncomplicated service software that delivers exceptional customer and employee experiences for over 72,000 companies worldwide.

What you will do

  • Design and implement tools to improve availability, latency, scalability, and system health.
  • Define SLIs/SLOs, manage error budgets, and lead performance engineering efforts.
  • Build and maintain automated monitoring, alerting, and remediation pipelines.
  • Collaborate with engineering teams to improve reliability by design and advocate for SRE best practices.
  • Lead incident response, root cause analysis, and blameless postmortems.
  • Contribute to infrastructure architecture, automation, and reliability roadmaps.

Requirements

  • 7–12 years of experience in SRE, DevOps, or Production Engineering roles.
  • Proficiency in coding and in-depth knowledge of Linux for system administration and troubleshooting.
  • Practical experience with Docker and Kubernetes for application deployment and management.
  • Experience designing, implementing, and maintaining Continuous Integration and Continuous Delivery (CI/CD) pipelines.
  • Understanding of security best practices and compliance in infrastructure.
  • Expertise in designing and implementing highly available, scalable, and resilient distributed systems.
  • Proficiency in Infrastructure as Code (IaC) tools and automating infrastructure provisioning and management.
  • Deep knowledge and practical experience with various Disaster Recovery (DR) and High Availability (HA) strategies.
  • Experience implementing and utilizing monitoring, logging, and tracing tools for system health.
  • Excellent analytical and diagnostic skills for resolving complex system issues.

Nice to have

  • Degree in Computer Science, Engineering, or a related field.
  • Experience building and scaling services in production with high uptime targets (99.99%+).
  • Clear track record of reducing incident frequency and improving response metrics (MTTD/MTTR).

Culture & Benefits

  • Fostered environment for employees to find their true potential, purpose, and passion.
  • Commitment to providing equal opportunity and diversity in the workplace.
  • Opportunity to build with a fresh vision and make a real impact.

Π‘ΡƒΠ΄ΡŒΡ‚Π΅ остороТны: Ссли вас просят Π²ΠΎΠΉΡ‚ΠΈ Π² iCloud/Google, ΠΏΡ€ΠΈΡΠ»Π°Ρ‚ΡŒ ΠΊΠΎΠ΄/ΠΏΠ°Ρ€ΠΎΠ»ΡŒ, Π·Π°ΠΏΡƒΡΡ‚ΠΈΡ‚ΡŒ ΠΊΠΎΠ΄/ПО, Π½Π΅ Π΄Π΅Π»Π°ΠΉΡ‚Π΅ этого - это мошСнники. ΠžΠ±ΡΠ·Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎ ΠΆΠΌΠΈΡ‚Π΅ "ΠŸΠΎΠΆΠ°Π»ΠΎΠ²Π°Ρ‚ΡŒΡΡ" ΠΈΠ»ΠΈ ΠΏΠΈΡˆΠΈΡ‚Π΅ Π² ΠΏΠΎΠ΄Π΄Π΅Ρ€ΠΆΠΊΡƒ. ΠŸΠΎΠ΄Ρ€ΠΎΠ±Π½Π΅Π΅ Π² Π³Π°ΠΉΠ΄Π΅ β†’