Эта вакансия в архиве
Посмотреть похожие вакансии ↓обновлено 1 месяц назад
Senior Manager, Infrastructure Reliability and AIOps Engineering (AI)
Описание вакансии
Текст:
TL;DR
Senior Manager, Infrastructure Reliability and AIOps Engineering (AI): Accountable for improving reliability, observability, and automated recovery across cloud infrastructure, networking, enterprise tools, and IAM with an accent on AIOps practices and operational outcomes including incident response and escalations. Focus on preventing incidents, reducing alert noise, and improving recovery performance through strategic reliability improvements and hands-on operational leadership.
Location: Hungary
Company
empowers over 8,000 organizations worldwide to create the best customer and employee experiences through its AI-powered Experience Orchestration platform, Cloud.
What you will do
- Own the reliability execution model from signal to restoration, including active incident engagement and escalation management.
- Operate and continuously improve the AIOps layer, focusing on event ingestion, correlation, and noise reduction.
- Define and measure reliability across services and platforms using SLIs, SLOs, and scorecards.
- Build and execute a reliability automation roadmap to reduce manual intervention and accelerate recovery.
- Drive resilient operational patterns, standardized health signals, and automated recovery paths for Cloud Infrastructure, Networking, Enterprise Tools, and IAM.
- Partner with ITSM/Platform Enablement and Security/Compliance to strengthen event-to-incident flows and ensure reliability supports control execution.
Requirements
- 8+ years in infrastructure operations, SRE, reliability engineering, or platform operations.
- 5+ years leading teams in an operations, reliability, or engineering environment.
- Proven track record of designing, architecting, and building reliability through AIOps, observability, automation, and incident learning.
- Experience building and operating alert/event management practices (signal quality, routing, enrichment, deduplication, suppression).
- Working knowledge across cloud infrastructure concepts, enterprise networking fundamentals, enterprise tool operations, and IAM lifecycle concepts.
- Strong incident command and stakeholder communication skills.
Nice to have
- Experience implementing practical SLOs/SLIs and operational scorecards tied to business impact.
- Experience with AIOps platforms and event-to-ITSM integration patterns.
- Leadership in scripting/automation (PowerShell, Python, Ansible, Terraform) and operationalizing safe automation at scale.
- Familiarity with service mapping, CMDB dependency modeling, and operational governance practices.
Culture & Benefits
- Work for a company that embraces empathy and cultivates collaboration.
- Opportunity to make a larger impact on the company and take ownership of your work.
- Receive great benefits and perks comparable to larger tech companies.
- Join a global team of over 6,000 employees.