Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Incident Manager (AI): Leading critical production incidents and coordinating multi-disciplinary response efforts for a cloud-native data and AI platform with an accent on reliability engineering and stakeholder communication. Focus on driving root cause analysis, optimizing incident playbooks, and ensuring technical resilience of distributed systems.
Location: Remote (US)
Salary: $103,900 — $145,525 USD
Company
Databricks is a data and AI company providing a Data Intelligence Platform to unify data, analytics, and AI for thousands of organizations worldwide.
What you will do
- Lead critical production incidents, coordinating multi-disciplinary response efforts across cloud services to restore operations.
- Drive technical root cause analysis (RCA) and collaborate with engineering to document failures in distributed systems.
- Own all communications during incidents, providing high-quality updates to executives and publishing customer-facing notifications.
- Implement reliability improvements and ensure technical and procedural action items are completed.
- Mentor and train peers in incident communication and technical response disciplines.
Requirements
- Must be based in the United States.
- 5+ years of experience in incident management, SRE, or production operations for large-scale, cloud-native systems.
- Strong understanding of cloud infrastructure (AWS, Azure, or GCP), including compute, networking, and storage.
- Expertise in log analysis and observability tools such as Datadog, Elasticsearch, Splunk, Prometheus, or Grafana.
- Proficiency in Python, Go, or Bash for automating diagnostics and data collection.
- BS, Master's, or advanced degree in Computer Science, Computer Engineering, or a related field.
Culture & Benefits
- Comprehensive benefits and perks package tailored to the employee's region.
- Eligibility for annual performance bonuses and equity.
- Commitment to a diverse and inclusive culture with equal employment opportunity standards.
- Opportunity to work on world-class data and AI infrastructure used by Fortune 500 companies.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →