Customer Reliability Engineer (SRE)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Customer Reliability Engineer (SRE): Managing high-severity customer incidents and building proactive reliability tooling for a global edge network with an accent on cross-layer debugging and system performance. Focus on designing AI-assisted diagnostics, automating incident response, and partnering with product engineering to ensure infrastructure stability for enterprise clients.
Location: Must be based in or able to work from Austin, Texas (Hybrid)
Company
is a global leader in internet security and performance, protecting millions of websites and critical infrastructure through an intelligent, AI-native global network.
What you will do
- Own high-severity (Sev-1) customer incidents from initial signal to resolution, performing deep-dive debugging across the full stack.
- Build proactive detection and diagnostic tooling to identify systemic risks before they impact customers.
- Develop AI-native agents and automation to streamline incident analysis and reduce manual toil.
- Partner with Product Engineering to drive fixes, workarounds, and configuration changes based on customer-facing insights.
- Define and track reliability metrics to drive measurable improvements in error rates and resolution times.
- Provide technical leadership and knowledge transfer to support teams through pair-debugging and runbook documentation.
Requirements
- Minimum 5 years of experience in SRE, escalation engineering, or systems operations, with at least 2 years in customer-facing roles.
- Strong foundation in networking (TCP/IP, BGP, OSPF, DNS, HTTP/S, TLS) and security (Firewalls, VPN, Zero Trust).
- Proficiency in Linux command-line tools and packet analysis (Wireshark, tcpdump, strace).
- Strong scripting and automation skills in Bash and Python.
- Experience with observability stacks (Kibana, Elasticsearch, Grafana) and distributed tracing.
- Must be authorized to work in the U.S. without sponsorship for export-controlled technology.
Nice to have
- Experience with infrastructure-as-code (Terraform, Pulumi) and container orchestration (Kubernetes).
- Deep expertise in L3/L4 network infrastructure and L7 application protocols.
- Track record of applying AI/ML to production engineering workflows.
- Experience with cloud networking across AWS, Azure, or GCP.
Culture & Benefits
- Comprehensive health, dental, and vision insurance plans.
- 401(k) retirement savings plan with company participation.
- Flexible paid time off covering vacation and sick leave.
- Equity participation plan for eligible roles.
- Supportive environment focused on building a better, more secure Internet.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →