Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Sr. SRE Engineer II - EPICS, NG-SIEM (Hybrid): Own reliability and scalability of the NG-SIEM SIEM platform by building end-to-end observability, automation, and scaling systems across complex ingest-to-search-to-workflow pipelines. Focus on designing monitoring and synthetic test suites, coordinating proportional scaling to eliminate cascading bottlenecks, and leading incident response with cross-service diagnosis and post-incident improvements.
Location: Hybrid (2-3x a week) in Bucharest, Romania
Company
CrowdStrike is a cybersecurity company building an AI-native platform for preventing and responding to breaches.
What you will do
- Design, build, and maintain end-to-end observability for the NG-SIEM pipeline (ingest through search and workflow execution) to enable rapid root-cause analysis across component boundaries.
- Engineer coordinated scaling solutions that treat the NG-SIEM pipeline as a unified system, proportionally increasing resources across dependent components (e.g., Kafka, ingest pipelines, downstream services).
- Serve as a subject matter expert during platform-wide incidents (P2 and above), diagnosing and resolving multi-component failures and coordinating incident communications; participate in follow-the-sun on-call rotations.
- Build and refine capacity forecasting and cost-management models for end-to-end pipeline dimensions, and develop tooling to surface cost drivers.
- Automate remediation via runbooks and self-healing workflows (e.g., pipeline-wide scaling responses, CID rebalancing, infrastructure healing) to resolve issues before customer impact.
- Collaborate with cell-level teams and stakeholders to triage SLO breaches and drive problem management for large reliability efforts.
Requirements
- 10+ years of experience in software engineering, site reliability engineering, or platform engineering, with significant time on large-scale distributed systems and pragmatic tradeoffs between delivery and long-term platform goals.
- Strong proficiency in at least one systems programming language (Go, Java, Rust, or C++) and one scripting language (Python, Bash).
- Deep experience with end-to-end observability: building monitoring pipelines, defining SLIs/SLOs, and creating dashboards that drive actionable insights across multi-service architectures.
- Proven ability to diagnose and resolve complex 24/7 incidents spanning multiple distributed components.
- Hands-on experience with streaming platforms (Kafka or similar), including backpressure, partition management, and consumer group dynamics at scale.
- Experience with infrastructure-as-code and CI/CD pipelines, plus comfort working across time zones with globally distributed teams.
Culture & Benefits
- Market-leading compensation and equity awards.
- Comprehensive physical and mental wellness programs.
- Competitive vacation and holidays, plus paid parental and adoption leaves.
- Professional development opportunities for all employees.
- Vibrant office culture with world-class amenities; hybrid schedule (2-3x/week in Bucharest).
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →