Senior Site Reliability Engineer (Web3)
Мэтч & Сопровод
Покажет вашу совместимость и напишет письмо
Описание вакансии
Senior Site Reliability Engineer CCIP
Company
Chainlink
Conditions
1 day agoSeniorSalary: 129K - 244KAnywhere Remote Full Time Engineering Jobs by Chainlink
Skills
Oncall Ccip Reliability Opentelemetry Sre Sli Slo Platform Crypto Observability Kubernetes Incident Web3 Response Automation
About the Role
You will ensure the reliability, scalability, and operational excellence of the CCIP platform. You will strengthen production resilience by improving deployment safety, establishing distributed tracing for observability, eliminating operational toil through automation, driving adoption of meaningful SLOs and SLIs and error budgets, and increasing platform scalability and readiness as CCIP grows. You will be responsible for maintaining highly available production systems and reducing operational overhead.
Requirements
- Demonstrated experience in Site Reliability Engineering, Production Engineering, or a similar role operating large-scale distributed systems.
- Deep expertise defining, implementing, and driving adoption of SLOs, SLIs, and error budgets across engineering organizations.
- Built and operated production Kubernetes environments supporting critical services.
- Applied OpenTelemetry to improve observability across distributed systems.
- Experience improving the reliability, scalability, and operability of production infrastructure.
- Demonstrated technical leadership influencing reliability practices across engineering teams.
- Experience performing capacity planning and performance tuning for high-throughput distributed services.
- Previous experience working on Web3 infrastructure or within a crypto-native engineering organization.
- Applied chaos engineering or fault-injection techniques to improve production resilience.
- Partnered with software engineering teams to conduct production-readiness reviews before service launches.
- Experience leading on-call operations, including defining rotations, escalation policies, and improving alert quality.
Responsibilities
- Improve deployment safety and increase delivery velocity by advancing production engineering practices.
- Establish distributed tracing across the platform to improve observability and accelerate incident investigation.
- Eliminate operational toil through automation that increases engineering efficiency and platform reliability.
- Drive adoption of meaningful SLOs, SLIs, and error budgets that guide engineering decisions and improve service health.
- Increase platform scalability and operational readiness as CCIP continues to grow.
- Strengthen Chainlink's reputation through highly available production systems while reducing operational overhead.
Benefits
- Long term incentives
- Comprehensive benefits
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →
Текст вакансии взят без изменений