Site Reliability Engineer (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Site Reliability Engineer (AI): Building and maintaining the reliability of a next-generation agentic clinical AI assistant with an accent on incident response and toil reduction. Focus on debugging complex Kubernetes failure modes, optimizing observability signals, and automating infrastructure to ensure clinical-grade stability.
Location: Hybrid in Amsterdam or Zurich
Company
develops an agentic clinical AI assistant designed to help clinicians reason across patient data, guidelines, and diagnostics in real clinical environments.
What you will do
- Own the reactive frontline by carrying on-call, triaging alerts, and driving incidents to resolution.
- Improve alert quality by transforming noisy signals into precise, actionable alerting.
- Lead blameless postmortems and implement durable fixes across technical teams.
- Develop automation and tooling to remove toil and improve runbooks and dashboards.
- Shape the SRE program and observability standards for a growing AI platform.
Requirements
- Deep Kubernetes experience, specifically debugging networking, resource pressure, and control-plane issues.
- Strong Linux fundamentals with the ability to debug at the OS level.
- Solid programming ability to build maintainable automation and tooling.
- Fluency in observability (metrics, logs, traces) and incident-response workflows.
- Internalized SRE principles, including SLO/SLI and error-budget thinking.
- Must be based in or able to work from Amsterdam or Zurich.
Nice to have
- Infrastructure-as-Code and CI/CD fluency (Terraform, Helm, GitOps).
- Experience with incident-management tooling like PagerDuty, Rootly, or incident.io.
- Exposure to regulated or high-stakes domains such as healthcare or fintech.
- Experience with structured logging and modern alerting practices.
Culture & Benefits
- Competitive salary, comprehensive pension plan, and 25 vacation days per year.
- EUR 1000 annual learning and development budget.
- Annual commuting subsidy.
- High level of autonomy and flexibility regarding work hours.
- Regular offsites and team events to celebrate success.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →