Operations Engineer (SRE)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Operations Engineer (SRE): Monitoring and investigating production issues across a global platform with an accent on incident response, observability, and operational automation. Focus on diagnosing high-availability services, maintaining incident communication, and improving detection workflows for global game developers.
Location: Baku, Azerbaijan (Remote flexibility indicated)
Company
is a global commerce company providing tools and services for video game developers to fund, distribute, and monetize their games.
What you will do
- Monitor the GTO Operational Dashboard (Datadog) to detect anomalies and identify potential production incidents.
- Triage and investigate production incidents, determining root causes and routing issues to appropriate SRE or engineering teams.
- Manage lower-severity incidents end-to-end, executing runbooks and resolving issues within defined thresholds.
- Support the TSO Lead during major incidents by providing real-time data, timeline management, and stakeholder communication.
- Develop and maintain operational automation scripts, incident templates, and documentation for new resolution procedures.
- Analyze incident trends and draft PIR (Post-Incident Review) documentation to contribute to long-term system stability.
Requirements
- 4+ years of experience in SRE, DevOps, NOC, or technical operations supporting high-availability platforms.
- Proficiency in at least one scripting language: Python, Go, or Bash.
- Hands-on experience with Datadog or equivalent observability platforms (Grafana, Splunk, New Relic).
- Solid working knowledge of Kubernetes and cloud infrastructure (GCP preferred).
- Strong written and verbal communication skills in English for status reporting and incident documentation.
- Comfort with 24x7 shift-based operations including rotating weekends.
Nice to have
- Experience in gaming, payments, or fintech environments with high transaction volume.
- Familiarity with database operations (MySQL, PostgreSQL, Redis, Kafka).
- Knowledge of CI/CD pipelines and deployment tooling such as GitLab CI, ArgoCD, or Helm.
- ITIL Foundation certification or experience with JIRA Service Management administration.
Culture & Benefits
- Fast-paced, collaborative, and globally distributed environment.
- Direct impact on services used by thousands of game developers worldwide.
- Exposure to cutting-edge observability and AI-assisted monitoring tools.
- Emphasis on process improvement, runbook development, and professional growth within SRE operations.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →