Senior Customer Reliability Engineer (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior Customer Reliability Engineer (AI): Leading high-severity incident response and proactive reliability engineering for critical infrastructure with an accent on deep-stack debugging across edge, network, and application layers. Focus on building AI-native diagnostic tooling and automation to identify systemic risks and reduce toil for enterprise customers.
Location: Must be based in Singapore (Hybrid role).
Company
is a global leader in Internet security and performance, protecting millions of websites and critical infrastructure through an intelligent, AI-driven global network.
What you will do
- Own high-severity customer incidents end-to-end, performing root cause analysis across the full stack.
- Develop proactive monitoring, detection, and diagnostic capabilities to identify systemic risks before they impact customers.
- Partner with Product Engineering to drive fixes, workarounds, and configuration changes based on incident insights.
- Build and iterate on AI-native agents and tooling to pre-diagnose incidents and propose evidence-based fixes.
- Define and track customer-facing reliability metrics to drive measurable improvements in system stability.
- Provide technical leadership and knowledge transfer to Customer Support teams to raise the overall technical floor.
Requirements
- Minimum 5 years of experience in SRE, escalation engineering, or systems operations, with at least 2 years in customer-facing roles.
- Strong foundation in networking and security (TCP/IP, DNS, HTTP/S, BGP, OSPF, Firewalls, VPN).
- Proficiency with observability and diagnostic tools (Wireshark, tcpdump, Kibana, Grafana, distributed tracing).
- Strong scripting and automation skills in Bash and Python.
- Experience with incident management, postmortem culture, and SLO/SLI-based reliability practices.
- Must be based in Singapore and comfortable with a hybrid work model.
Nice to have
- Experience applying AI/ML to production engineering or operational workflows.
- Deep expertise in both L3/L4 network infrastructure and L7 application protocols.
- Familiarity with CI/CD pipelines, Terraform, and Kubernetes.
- Experience with cloud networking across AWS, Azure, or GCP.
Culture & Benefits
- Mission-driven environment focused on building a better, more secure Internet.
- Opportunity to work on critical infrastructure used by governments, banks, and health systems.
- Emphasis on AI-native engineering and rapid iteration.
- Commitment to diversity, inclusion, and equal opportunity employment.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →