Observability Platform Engineer (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Observability Platform Engineer (AI): Designing, deploying, and operating global-scale monitoring, logging, and tracing systems for GPU-powered AI infrastructure with an accent on scalability and automation. Focus on instrumenting distributed systems, automating observability via IaC, and improving incident detection and resolution.
Location: UK
Company
is a GPU cloud provider engineered for AI, providing high-performance infrastructure for AI start-ups and large enterprise customers.
What you will do
- Design and maintain global-scale observability platforms including monitoring, logging, tracing, and alerting.
- Deploy and manage tools such as Prometheus, Grafana, Datadog, ELK/Opensearch, OpenTelemetry, and Jaeger.
- Automate observability infrastructure using Infrastructure-as-Code and CI/CD pipelines.
- Partner with SRE and Engineering teams to instrument applications and systems for telemetry.
- Develop real-time dashboards and alerts to provide visibility into infrastructure health.
- Document observability standards, tools, and processes.
Requirements
- Strong experience in designing and operating observability platforms at scale.
- Hands-on expertise with Prometheus, Grafana, Datadog, ELK/Opensearch, OpenTelemetry, or Jaeger.
- Experience with cloud-native infrastructure including Kubernetes, containers, and service meshes.
- Proficiency in scripting and automation using Python, Go, or Bash.
- Knowledge of Infrastructure-as-Code tools like Terraform, Ansible, or Pulumi.
- Must be located in the UK
Nice to have
- Experience with AI/ML workload observability.
- Familiarity with hyperscale datacenter environments.
- Knowledge of AIOps and advanced telemetry analytics.
- Exposure to sustainability monitoring and efficiency metrics.
Culture & Benefits
- Culture of relentless innovation, ownership, and accountability.
- Environment built on openness, transparency, and excellence.
- Commitment to an inclusive, diverse, and equitable workplace.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →