Senior Site Reliability Engineer (Golang)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior Site Reliability Engineer (Golang): Designing and evolving the automation framework and tooling for the Mission Control platform with an accent on scalability, reliability, and reducing operational toil. Focus on building self-service APIs, implementing SLO/SLA measurements, and integrating AI-assisted capabilities into operational workflows.
Location: Remote within Germany or Switzerland
Company
provides secure connectivity (SASE) by combining SD-WAN, Firewall, SWG, CASB, and ZTNA into a managed 24x7 service.
What you will do
- Design, build, and evolve the automation framework and tooling for the Mission Control platform using Golang.
- Develop self-service APIs and operational tooling to enable production operations without manual intervention.
- Implement and improve SRE principles, including SLIs, SLOs, error budgets, and SLA measurements.
- Lead reliability and automation projects independently from problem identification through long-term operation.
- Collaborate with the AI team to integrate AI-assisted capabilities into automation and operational workflows.
- Participate in incident response and on-call rotations, leading root cause analysis and preventive actions.
Requirements
- Strong software engineering background with production-grade Go (Golang) experience.
- Solid understanding of distributed systems and scalable architecture.
- Proven experience designing and operating production services and APIs (REST, gRPC).
- Hands-on experience with Kubernetes, Terraform, GitOps, and CI/CD tooling.
- Proficiency with observability stacks (Prometheus, Loki, Tempo) and major cloud platforms (AWS, Azure, GCP).
- Knowledge of Linux system administration, networking concepts, and major Internet protocols (TCP/IP, IPsec, SSL, DNS).
Nice to have
- Experience building internal developer platforms or automation systems.
- Interest in integrating AI/LLM-assisted capabilities into operational tooling.
- Background in networking or security operations.
- Experience operating SLO-based reliability programs at scale.
Culture & Benefits
- Collaborative environment focused on simplifying secure connectivity and customer safety.
- Opportunities for skill development and professional career advancement.
- Flexible workload options (80–100% employment).
- Culture of unconventional thinking and solving complex problems together.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →