TL;DR
Staff Site Reliability Engineer (AI/Azure): Leading the design and optimization of scalable, resilient infrastructure for cloud-native AI services on Azure with an accent on continuous delivery, observability, and automation. Focus on architecting solutions, establishing best practices for SLIs/SLOs, and fostering a reliability culture across teams.
Location: US Remote
Salary: $183,400–$245,400 USD
Company
hirify.global is a product company building an AI platform that powers autonomous agents and real-time learning.
What you will do
- Lead the design, implementation, and optimization of scalable, resilient infrastructure for cloud-native AI services on Azure.
- Establish continuous delivery (CD) pipelines supporting blue-green deployments, automatic rollbacks, and progressive delivery patterns.
- Champion observability excellence, defining best practices for metrics, tracing, logging, SLIs, SLOs, and error budgets.
- Drive automation across the entire lifecycle: infrastructure provisioning, testing, deployment, and recovery.
- Partner with the engineering team to design reliable, fault-tolerant services and perform resilience and capacity reviews.
- Mentor engineers and foster a reliability culture across teams to enable self-healing, observable systems.
Requirements
- Bachelor’s or Master’s degree in Computer Science, Information Technology, or related field.
- Solid experience in SRE, DevOps, or infrastructure engineering, with strong hands-on expertise in Azure.
- Proven experience designing and operating distributed systems at scale with a strong understanding of reliability engineering principles (SLIs/SLOs/SLA).
- Deep proficiency with Terraform, Kubernetes, Docker, and modern IaC and container orchestration best practices.
- Expertise in CI/CD automation and release engineering, capable of implementing blue-green, canary, and rollback mechanisms.
- Advanced use of observability tools such as Mimir, Grafana, Prometheus, and ELK stack.
Nice to have
- Knowledge of SQL Server and PostgreSQL performance tuning and management in cloud environments.
- Experience promoting GitOps workflows and tools such as Argo CD or Flux.
Culture & Benefits
- Flexible time off with ample learning and development opportunities including leadership training.
- Comprehensive onboarding program and recognition through Bonusly and peer-nominated awards.
- Company-paid medical, dental, and vision (with 100% employer-paid options and 90% coverage for dependents), FSA, HSA, 401k match, and telehealth options including One Medical.
- Parental leave and support, up to $20k in fertility services, surrogacy, and adoption reimbursement, maternity support through Maven Maternity, and free breast milk shipping through Maven Milk.
- Pet insurance, legal advisory services, and financial planning tools.
- Commitment to individuality, uniqueness, and diversity in a non-discriminatory environment.
Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →