Site Reliability Engineer (Azure)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Site Reliability Engineer (Azure/DevOps): Managing and optimizing production Enterprise clusters within the Azure cloud environment with an accent on incident management, observability, and automation. Focus on designing AI-driven monitoring systems, troubleshooting large-scale distributed systems, and driving long-term reliability enhancements.
Location: Must be a U.S. citizen and based in the United States due to security clearance requirements.
Company
is a unicorn company providing high-performance data platforms used by over 10,000 global businesses.
What you will do
- Own and manage production incidents impacting Enterprise clusters in the Azure cloud.
- Troubleshoot complex issues across distributed systems and drive root cause analysis.
- Design and develop automation tools and internal platforms using AI-assisted development tools like Cursor and Codex.
- Enhance observability using Prometheus, Grafana, and Azure Monitor by building AI-driven systems for anomaly detection.
- Participate in a 24/7 global follow-the-sun on-call rotation.
- Partner with R&D and Product teams to resolve bugs and influence product improvements.
Requirements
- 4+ years of experience in SRE, Cloud Operations, or Infrastructure Engineering.
- U.S. citizenship required for eligibility for a U.S. Top Secret/SCI security clearance.
- Proven experience troubleshooting production systems at scale with major cloud providers, specifically Azure.
- Strong Linux/Unix systems knowledge and understanding of networking fundamentals (TCP/IP).
- Proficiency in scripting with Python and Bash.
- Experience with monitoring tools (Prometheus, Grafana, ELK, Splunk) and KQL for telemetry analysis.
Nice to have
- Familiarity with or other NoSQL databases.
- Experience with Infrastructure as Code (Terraform, Pulumi).
- Cloud and Linux certifications.
- Experience with C#.
- Experience in regulated environments such as FedRAMP or AirGap.
Culture & Benefits
- Opportunity to work with cutting-edge SRE tools and state-of-the-art products.
- Role focused on tackling technical challenges on a global scale.
- Commitment to a diverse and inclusive work environment where all differences are celebrated.
- Support for accessibility and reasonable accommodations for applicants with disabilities.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →