Principal Site Reliability Engineer (Cybersecurity)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Principal Site Reliability Engineer (Cybersecurity): Building and optimizing high-availability, scalable cloud infrastructure across multi-cloud environments with an accent on automation and observability. Focus on reducing Mean Time to Mitigate (MTTM), implementing self-healing systems, and maturing architectural standards for a global security platform.
Location: Must be based in the USA. Hybrid (3 days a week in San Jose, CA) or Remote options available.
Salary: $164,500 – $235,000 USD
Company
is an AI-forward enterprise providing a cloud-native Zero Trust Exchange platform to secure digital transformation for millions of users.
What you will do
- Design and implement highly available, scalable infrastructure across AWS, Azure, GCP, and bare-metal environments.
- Drive an "automation-first" culture by writing Python and Go code to eliminate manual toil.
- Implement sophisticated observability using Prometheus, Grafana, and OpenTelemetry, defining SLIs/SLOs and error budgets.
- Serve as a lead Incident Commander, develop response playbooks, and conduct deep-dive post-incident analyses.
- Partner with Engineering teams to perform technical operability reviews.
Requirements
- 10+ years of experience managing reliability, scalability, and availability for large-scale production services.
- Deep expertise in programming with Python, Go, or C/C++.
- Strong background in networking protocols, Linux/FreeBSD systems, and distributed architecture.
- Experience in high-stakes incident management and participation in a 24/7 on-call rotation.
- Proficiency in ITIL frameworks to drive service maturity through systematic problem management.
- Authorized to work in the US.
Nice to have
- Extensive experience with Infrastructure-as-Code (Ansible, Terraform) and public clouds.
- Experience with chaos engineering and disaster recovery planning at scale.
- Expertise in global routing (BGP), traffic tunneling (GRE, IPSec), and L7 proxy architectures (HAProxy).
Culture & Benefits
- Comprehensive and inclusive benefits program supporting families through all life stages.
- A high-accountability culture that values impact over activity and customer obsession.
- Environment based on transparency, trust, and constructive, honest debate.
- Flexible work arrangements including remote and hybrid options.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →