TL;DR

Principal Site Reliability Operations Engineer (SRE): Managing and resolving production incidents, improving incident processes, and enhancing system architecture for %hirify_global%'s distributed environment with an accent on maintaining uptime, driving incidents to resolution, and post-mortem analysis. Focus on troubleshooting complex technical challenges, automating routine tasks, and mentoring junior team members.

Location: Hybrid in San Mateo, CA, United States (onsite Tuesday, Wednesday, Thursday)

Salary: $226,450–$262,150 USD (Annual)

Company

%hirify_global% is a platform empowering a global community of developers and creators to build 3D immersive digital experiences, connecting tens of millions daily.

What you will do

Lead and manage production incidents.
Collaborate cross-functionally to troubleshoot and resolve sophisticated technical challenges.
Guide the implementation of incident management processes and procedures.
Continually monitor system health, performance, and capacity.
Conduct comprehensive post-mortem analysis to ascertain root causes and formulate corrective measures.
Contribute substantially to the design and enhancement of system architecture to boost reliability and performance.
Leverage coding skills to automate daily routine tasks and enhance system efficiency.
Serve in the Incident Manager On-Call rotation and mentor junior team members.

Requirements

8+ years of experience in a comparable role within a Site Reliability Team.
Advanced knowledge of systems and network infrastructure protocols.
Demonstrated ability in managing, troubleshooting, and resolving incidents in distributed environments.
Familiarity with Python, Golang, or similar scripting/programming languages to automate routine tasks.
Bachelor's degree or equivalent experience in Computer Science, Computer Engineering, or a similar technical field.
Excellent communication skills to distill complex technical issues into clear and concise language.

Culture & Benefits

Shape the future of human interaction and solve unique technical challenges at scale.
Eligible for equity compensation and comprehensive benefits.
Work in a complex distributed environment full of continuous change.
Opportunity to connect a billion people with optimism and civility.