Site Reliability Engineer
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Site Reliability Engineer (CloudBlue): Ensuring reliability, scalability, and observability of multi-tenant SaaS platforms for cloud commerce with an accent on monitoring, high availability, and incident response. Focus on designing fault-tolerant architectures, automating toil reduction, and improving Kubernetes-based systems resilience.
Remote opportunity, welcoming applications globally but prioritizing candidates based in Malaysia due to team needs and coverage.
Company
Fast-growing web hosting company providing cloud services, website builders, and CloudBlue platform for service providers worldwide.
What you will do
- Define and implement SLIs, SLOs, and error budgets for critical services.
- Design high-availability architectures with redundancy, failover, and disaster recovery.
- Build and operate observability stack using Datadog, Grafana, and Elastic Stack.
- Lead incident response, postmortems, and reliability improvements.
- Conduct capacity planning, load testing, and performance optimization.
- Automate processes and promote SRE best practices across teams.
Requirements
- 3+ years as SRE, DevOps, or Production Engineer with production ownership.
- Experience with highly available multi-tenant SaaS platforms.
- Hands-on with Datadog, Grafana, Elasticsearch/Kibana.
- Strong Linux, networking, distributed systems knowledge.
- Docker, Kubernetes, Python/Bash scripting.
- On-call rotations and incident response experience.
- Strong written and spoken English.
Nice to have
- Defining SLIs/SLOs and error budgets at scale.
- Hyperscale or service-provider platforms.
- Cloud experience, preferably Azure.
- Hybrid/on-premises integrations.
- Chaos engineering and resilience testing.
Culture & Benefits
- Competitive salary and career advancement opportunities.
- Flexible work arrangements for work/life balance.
- Friendly culture built on trust, respect, and diversity.
- 24/7 award-winning customer support in four languages.
- Professional development and growth in a rapidly expanding team.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →