Senior Site Reliability Engineer
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior Site Reliability Engineer (Data Pipelines): Building and operating infrastructure and tooling for large-scale data ingestion pipelines with an accent on automation, observability, CI/CD, and reliability at scale. Focus on designing Kubernetes-based provisioning, monitoring solutions with Grafana/ELK, incident response, and SLO tracking for production systems.
Company
develops and deploys systematic financial strategies across global markets using a proprietary research platform to produce high-quality predictive signals (alphas).
What you will do
- Design and develop automation, monitoring, CI/CD pipelines, and reliability features for data onboarding pipelines (70% build).
- Build observability solutions using Grafana, ELK stack, Vector; implement infrastructure-as-code for Kubernetes and bare metal hosts.
- Integrate tools like Redis, Celery, MySQL; develop internal services to reduce toil.
- Participate in on-call rotations, respond to incidents, drive post-mortems, and define SLOs/SLIs (30% operate).
- Diagnose performance issues, create runbooks, plan capacity, and optimize resources.
- Collaborate with engineering, analyst, and research teams to ensure pipeline reliability.
Requirements
- 8+ years in SRE, DevOps, or platform engineering.
- Expertise in Linux for infrastructure management and troubleshooting.
- Strong Python for scripting, CLI tools, and automation.
- Deep Kubernetes and Docker experience: deploying, scaling, debugging production workloads.
- Observability: Grafana, Prometheus, ELK; dashboards, alerts, SLOs.
- CI/CD (GitLab CI), IaC (Ansible); databases (MySQL/PostgreSQL), message queues (Kafka, Redis, Celery).
- Networking, APIs, incident management, on-call experience.
- Leadership: mentoring, roadmaps, cross-team coordination.
- Openness to AI agents and LLM tools in workflow.
Nice to have
- Cloud: GCP or AWS.
- Data tools: Apache Arrow, gRPC, columnar formats.
- Big data: Hadoop, Spark.
- Languages: C/C++, Golang, Scala, JavaScript.
- Financial services background; SRE principles from Google's book.
Culture & Benefits
- Intellectually driven environment encouraging open thinking, continuous improvement, and collaboration.
- Competitive compensation, clear career roadmap, learning opportunities: training, library, speakers, share events.
- Premium health insurance, employee assistance, generous time-off, sabbaticals, trade union benefits.
- Team activities: monthly events, clubs (sports, yoga, gaming), daily snacks, happy hours.
- Annual trips, global conferences for travel and networking.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →