Senior SRE Engineer (Observability Focus)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior SRE Engineer (Observability Focus): Own end-to-end observability by designing and operating the telemetry stack for production visibility across hybrid AWS and on-prem environments with an accent on metrics, logs, and traces at scale. Focus on building VictoriaMetrics/OpenSearch/OpenTelemetry pipelines, running Kafka-based telemetry transport, and delivering Grafana dashboards and alerting that engineers actually use.
Location: Warsaw (Poland) / Sofia (Bulgaria) / Limassol (Cyprus) / Remote
Company
is a trading platform expanding globally with a focus on cutting-edge technology and client experience.
What you will do
- Own the full observability stack: VictoriaMetrics (metrics), OpenSearch (logs), and OpenTelemetry (traces) from pipeline design to day-2 operations.
- Architect and operate VictoriaMetrics clusters, including vmagent scraping, remote write, vmalert rules, and cardinality control.
- Operate OpenSearch clusters with ISM, hot-warm-cold architecture, shard tuning, and ingest pipelines via Data Prepper.
- Build and maintain OpenTelemetry Collector pipelines (receivers/processors/exporters) and instrument services across Java, Python, and JS/TS.
- Run Kafka as the telemetry transport layer (topic design, partition strategy, consumer lag monitoring, throughput tuning).
- Build Grafana dashboards/alerting and improve sampling strategies, batching, and context propagation; contribute to incident response and post-mortems.
Requirements
- 6+ years in DevOps/SRE/platform engineering, including 2+ years focused on observability tooling at production scale.
- Hands-on VictoriaMetrics (or Prometheus) expertise: MetricsQL/PromQL, exporters, service discovery, remote write, downsampling, and retention management.
- Solid OpenSearch/Elasticsearch skills: cluster operations, Query DSL, ISM policies, and ingest pipeline design.
- Production experience with OpenTelemetry: Collector configuration, OTLP, context propagation, and instrumentation across multiple languages.
- Strong Kafka skills: producer/consumer patterns, consumer group management, Kafka Connect, Schema Registry, and JMX monitoring (Strimzi on Kubernetes is a plus).
- Working knowledge of Kubernetes (operators, Helm), Argo CD/GitOps, and Terraform/Ansible; scripting in Bash or Python for automation.
Culture & Benefits
- Hybrid work model (#LI-Hybrid) with additional workation days to work remotely from anywhere (restrictions apply).
- Generous time off with an annual leave policy and extra paid volunteer days.
- Comprehensive health and pension benefits, including location-specific perks.
- Employee referral program.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →