Intermediate Site Reliability Engineer (AIOps)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Intermediate Site Reliability Engineer (AIOps): Design and implement AI-powered solutions for observability, incident response, and cloud-native platform optimization with an accent on ML-based anomaly detection, generative AI automation, and predictive reliability. Focus on building self-healing systems, time-series forecasting, and fault-tolerant infrastructure using AI agents and orchestration.
Location: Hybrid in Toronto, Ontario; must reside within commutable distance to Mississauga or Salt Lake City office. In-office events required including weekly/bi-weekly/monthly team events.
Salary: CAD $115,000–$128,000 + bonus + benefits
Company
Leading health tech SaaS platform serving 30,000+ long-term and post-acute care providers with AI-accelerated innovation.
What you will do
- Build ML-based anomaly detection, pattern recognition, and smart telemetry enhancement for AI-driven observability.
- Develop event-driven workflows, self-healing systems, and automate incident response using generative AI and agent orchestration.
- Implement time-series forecasting, predictive modeling, AI-powered autoscaling, and cost-aware resource allocation.
- Engineer scalable, fault-tolerant cloud-native systems and participate in on-call rotations for critical incident response.
- Integrate APIs for data exchange and run AIOps workshops to enable teams with AI maturity models and responsible AI practices.
Requirements
- 5+ years in software engineering with SRE principles and AI/ML in production environments.
- Strong debugging, problem-solving, and system design skills; passion for automation and operational excellence.
- Languages: Python, Java, Bash, Terraform.
- Platforms: Azure, Kubernetes, Docker.
- Tools: Datadog, Prometheus, AppDynamics, ELK, GitHub Actions, Jenkins, ArgoCD, Spinnaker; Databases: SQL Server, PostgreSQL, MySQL.
- ML/AI: MCP framework, AI agents, Vector store, LangChain, RAG.
Nice to have
- Experience with AIOps platforms.
- Contributions to open-source or AI communities.
- Familiarity with Responsible AI frameworks.
- Participation in AI hackathons or conferences.
Culture & Benefits
- Benefits from Day 1 including retirement plan matching, flexible PTO, wellness programs.
- Parental & caregiver leaves, fertility & adoption support, continuous development program.
- Employee Assistance Program, allyship and inclusion communities, employee recognition.
- Flexibility, growth opportunities, and meaningful work in a founder-led, privately held company.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →