Эта вакансия в архиве
Посмотреть похожие вакансии ↓Senior Site Reliability Engineer (Platform Resilience)
Описание вакансии
TL;DR
Senior Site Reliability Engineer (Platform Resilience): Designing, building, scaling, and maturing a multi-cloud platform for internal and external services with an accent on automating system engineering efforts to guarantee reliability. Focus on growing global infrastructure to meet scaling demands and developing software and tooling to support rapid product deployment.
Location: Must be based in Spain, Greece, Ireland, Norway, Poland, Portugal, Sweden, Australia, or New Zealand. Cannot be located in Belarus, Cuba, Iran, North Korea, Syria, or Russia, including annexed Ukrainian territories.
Company
is the Search AI Company, enabling real-time answers using all data at scale through its cloud-based platform for search, security, and observability.
What you will do
- Lead technical initiatives for automating system engineering efforts to guarantee global infrastructure reliability.
- Grow global platform infrastructure to meet increasing scaling demands by developing and maintaining software, tooling, and automations.
- Champion an environment focused on collaboration, operational excellence, and uplifting others.
- Respond to and prevent repeated customer impact in response to major incidents and prioritised problem management, participating in a follow-the-sun on-call rotation.
Requirements
- Background in software engineering to collaborate with engineers to identify, implement, and deliver solutions.
- Experience operating a SaaS product in a public cloud, ideally built using Infrastructure-as-Code tooling such as Crossplane or Terraform.
- Experience building or operating Kubernetes-at-scale infrastructure, ideally across multiple cloud providers, and supporting automation.
- Proficiency in Golang or other programming languages.
- Experience with containerized services such as Docker.
- Proven experience in leading and improving alerting and major incident management standard processes metrics systems (e.g., Stack, Graphite, Prometheus, Influx).
- Professional skills in Linux system administration on distributed systems at scale.
- Experience thriving in a self-organizing and sharing globally distributed team environment.
Nice to have
- Experience in public cloud and managed Kubernetes services.
Culture & Benefits
- Competitive pay based on work performed.
- Health coverage for you and your family in many locations.
- Ability to craft your calendar with flexible locations and schedules.
- Generous number of vacation days each year.
- Matched financial donations and service up to $2000 (or local equivalent).
- Up to 40 hours each year for volunteer projects.
- Minimum 16 weeks of parental leave.