Назад
обновлено 2 дня назад

Sr. SRE Engineer II (EPICS, NG-SIEM)

Формат работы
hybrid
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
Australia
Вакансия из списка Hirify.GlobalВакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Sr. SRE Engineer II (EPICS, NG-SIEM): Own reliability and scalability of a next-generation SIEM platform with an accent on end-to-end observability, coordinated scaling, and incident response across complex distributed pipelines. Focus on building automation and scaling systems that keep ingest, search, and workflow execution healthy under 24/7 high-volume load.

Location: Australia (Sydney) — hybrid; expected in the Sydney office (Level 18, 141 Walker Street, North Sydney) 2–3x a week.

Company

Global cybersecurity company building an AI-native security platform.

What you will do

  • Design, build, and maintain end-to-end observability (monitoring and synthetic tests) across the NG-SIEM pipeline from ingest through search and workflow execution.
  • Engineer coordinated scaling solutions that treat the NG-SIEM pipeline as a unified system and eliminate cascading bottlenecks across dependent components (e.g., Kafka, ingest pipelines, downstream services).
  • Lead platform-wide incident response (P2 and above) as a subject matter expert, diagnosing and resolving multi-component failures and coordinating incident communications; participate in follow-the-sun on-call rotations.
  • Build capacity forecasting and cost management models for end-to-end pipeline dimensions; develop tooling to track and surface cost drivers.
  • Automate remediation via runbooks (e.g., pipeline-wide scaling responses, CID rebalancing, infrastructure healing) to resolve issues before customer impact.
  • Collaborate with cross-functional teams to triage SLO breaches, drive problem management, and improve long-term platform resilience and efficiency.

Requirements

  • 10+ years of experience in software engineering, site reliability engineering, or platform engineering, with significant time on large-scale distributed systems.
  • Strong proficiency in at least one systems programming language (Go, Java, Rust, or C++) and one scripting language (Python, Bash).
  • Deep experience with end-to-end observability: building monitoring pipelines, defining SLIs/SLOs, and creating dashboards for multi-service architectures.
  • Proven ability to diagnose and resolve complex 24/7 incidents spanning multiple distributed components.
  • Experience with coordinated capacity planning and scaling for large infrastructure footprints.
  • Hands-on experience with streaming platforms (Kafka or similar), including backpressure, partition management, and consumer group dynamics at scale.

Nice to have

  • Experience in a similar reliability/platform role at a hyperscaler (AWS, Azure, GCP) or large-scale SaaS provider.
  • Track record of automated remediation and self-healing infrastructure.
  • Cost modeling and unit economics for large compute and storage footprints.
  • Familiarity with cloud-native architectures and serverless computing.
  • Exposure to log management, cybersecurity products, or security operations workflows.
  • Experience with disaster recovery planning and execution for multi-region systems.

Culture & Benefits

  • Hybrid work with expectation to be in the Sydney office 2–3x per week.
  • Market-leading compensation and equity awards.
  • Comprehensive physical and mental wellness programs.
  • Competitive vacation and holidays; paid parental and adoption leaves.
  • Professional development opportunities for all employees.
  • Vibrant office culture and employee networks/volunteer opportunities.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →