Site Reliability Engineer I
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Site Reliability Engineer I (Observability/Security): Own availability of planet-scale observability and security products with an accent on operational excellence, cloud resource optimization, security hardening, and developer velocity. Focus on improving microservices lifecycle, defining and managing SLOs, writing automation to eliminate toil, and facilitating incident response.
San Jose, Costa Rica - Remote
Company
unifies security and operational data through its Intelligent Operations Platform, empowering teams to detect, investigate, and resolve cybersecurity and cloud operations challenges.
What you will do
- Improve lifecycle of microservices and architectural components from design through operation and refinement.
- Define, evolve, and manage SLOs with the global SRE team.
- Write code and automation to reduce operational workload, boost efficiency, enhance security, and accelerate feature delivery.
- Scale systems sustainably via automation and evolve them for better reliability and velocity.
- Facilitate blame-free root cause analysis for incidents and drive improvements.
- Participate in global incident response coordination and drive issue resolution across teams.
Requirements
- Cloud native application development experience with best practices and design patterns.
- Strong debugging and troubleshooting across the technology stack.
- Understanding of AWS Networking, Compute, Storage, and managed services.
- Experience with CI/CD tooling like Kubernetes, Terraform, Ansible, Jenkins.
- Infrastructure as Code with Terraform or CloudFormation.
- Full lifecycle support of services from creation to production.
- Production-ready code in Java, Scala, or Go.
- Linux systems proficiency and command line expertise.
- Modern cloud-native software security approaches.
- Agile frameworks like Scrum and Kanban.
- Bachelor’s or Master’s in Computer Science, Electrical Engineering, or related field.
- 1+ years of industry experience.
Nice to have
- Experience with or other observability products.
- Planet-scale product development.
- Running SaaS on AWS at expert level.
- Streaming technologies like Kafka, Kafka Streams, KSQL.
- Advanced Java, Go, Scala, or Python.
- Advanced Terraform, Jenkins, Kubernetes.
- Extensive JVM workloads at scale.
Culture & Benefits
- Work in a fast-paced iterative environment with global SRE team.
- Flexible roles with willingness to learn products.
- Focus on blame-free postmortems and continuous improvement.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →