Site Reliability Engineering (SRE) Intern
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Site Reliability Engineering (SRE) Intern: Assist in monitoring, maintaining reliability of cloud-based services and platforms, and supporting SRE operations with an accent on incident management, automation, and observability. Focus on building tools, troubleshooting databases and event-streaming systems, and contributing to infrastructure provisioning and CI/CD.
Remote position based anywhere in the United States or Canada.
24.00 - 34.50 USD Hourly (varies by location: SF Bay Area 27.60-34.50, other US 24.00-30.00).
Company
provides cloud platforms and services for broadband communications.
What you will do
- Monitor and maintain reliability of cloud-based services and platforms.
- Support incident investigation, root cause analysis, and post-incident documentation.
- Participate in 24/7 rotational shift coverage for monitoring and alert triage under supervision.
- Build and enhance automation tools and scripts using Python and Shell.
- Contribute to observability with metrics, logs, and traces using Grafana and Prometheus.
- Troubleshoot databases, Kafka/event-streaming systems, and manage IaC with Terraform.
- Assist with CI/CD pipelines, deployments, runbooks, dashboards, and documentation.
- Collaborate with engineers to improve system resilience and scalability.
Requirements
- Currently enrolled in Bachelor's or Master's in Computer Science, Engineering, or related field; preference for Junior/Senior years with prior experience.
- Strong fundamentals in Linux/Unix systems and command-line usage.
- Basic understanding of networking concepts (TCP/IP, DNS, load balancing).
- Familiarity with Python, Shell scripting.
- Basic knowledge of databases (MySQL, PostgreSQL, MongoDB).
- Willingness to participate in 24/7 rotational shifts.
- Good problem-solving skills; able to work full summer (May-August or June-September).
Nice to have
- Exposure to cloud platforms (GCP, AWS, Azure).
- Familiarity with Kafka or distributed messaging.
- Database reliability concepts (backups, replication, failover).
- Awareness of Kubernetes and containerized workloads.
- Experience with Git and CI/CD (Jenkins).
- Interest in SRE principles (SLIs, SLOs, error budgets).
Culture & Benefits
- Part of award-winning summer intern program with training and on-the-job learning.
- 90-day program with hands-on exposure to production systems and 24/7 operations.
- Globally distributed SRE team.
- Benefits information available on company careers page.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →