Назад
1 день назад

Sr. SRE Engineer II - EPICS, NG-SIEM (Hybrid)

Формат работы
hybrid
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
Romania
Вакансия из списка Hirify.GlobalВакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Sr. SRE Engineer II - EPICS, NG-SIEM (Hybrid): Own reliability and scalability of the NG-SIEM SIEM platform by building end-to-end observability, automation, and scaling systems across complex ingest-to-search-to-workflow pipelines. Focus on designing monitoring and synthetic test suites, coordinating proportional scaling to eliminate cascading bottlenecks, and leading incident response with cross-service diagnosis and post-incident improvements.

Location: Hybrid (2-3x a week) in Bucharest, Romania

Company

CrowdStrike is a cybersecurity company building an AI-native platform for preventing and responding to breaches.

What you will do

  • Design, build, and maintain end-to-end observability for the NG-SIEM pipeline (ingest through search and workflow execution) to enable rapid root-cause analysis across component boundaries.
  • Engineer coordinated scaling solutions that treat the NG-SIEM pipeline as a unified system, proportionally increasing resources across dependent components (e.g., Kafka, ingest pipelines, downstream services).
  • Serve as a subject matter expert during platform-wide incidents (P2 and above), diagnosing and resolving multi-component failures and coordinating incident communications; participate in follow-the-sun on-call rotations.
  • Build and refine capacity forecasting and cost-management models for end-to-end pipeline dimensions, and develop tooling to surface cost drivers.
  • Automate remediation via runbooks and self-healing workflows (e.g., pipeline-wide scaling responses, CID rebalancing, infrastructure healing) to resolve issues before customer impact.
  • Collaborate with cell-level teams and stakeholders to triage SLO breaches and drive problem management for large reliability efforts.

Requirements

  • 10+ years of experience in software engineering, site reliability engineering, or platform engineering, with significant time on large-scale distributed systems and pragmatic tradeoffs between delivery and long-term platform goals.
  • Strong proficiency in at least one systems programming language (Go, Java, Rust, or C++) and one scripting language (Python, Bash).
  • Deep experience with end-to-end observability: building monitoring pipelines, defining SLIs/SLOs, and creating dashboards that drive actionable insights across multi-service architectures.
  • Proven ability to diagnose and resolve complex 24/7 incidents spanning multiple distributed components.
  • Hands-on experience with streaming platforms (Kafka or similar), including backpressure, partition management, and consumer group dynamics at scale.
  • Experience with infrastructure-as-code and CI/CD pipelines, plus comfort working across time zones with globally distributed teams.

Culture & Benefits

  • Market-leading compensation and equity awards.
  • Comprehensive physical and mental wellness programs.
  • Competitive vacation and holidays, plus paid parental and adoption leaves.
  • Professional development opportunities for all employees.
  • Vibrant office culture with world-class amenities; hybrid schedule (2-3x/week in Bucharest).

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →