Sr. Site Reliability Engineer I
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Sr. Site Reliability Engineer I (SRE): Build and maintain the reliability, scalability, and performance of digital media measurement platforms with an accent on observability, proactive monitoring, and reducing MTTR through automation. Focus on incident response for Sev1/Sev2 situations, infrastructure reliability across GCP/AWS/OCI and on-prem, and production-grade Infrastructure-as-Code using Terraform and Kubernetes.
Location: NYC Global HQ
Salary: $89,000.00 - $178,000.00
Company
provides digital performance solutions that verify, optimize, and prove the quality and effectiveness of digital advertising using third-party data and analytics.
What you will do
- Build and maintain reliability, scalability, and performance for digital media measurement platforms.
- Implement observability best practices (metrics, dashboards, alerting) to drive proactive reliability improvements.
- Reduce MTTR for critical incidents using automation, improved observability, and proactive monitoring.
- Respond to incidents and drive resolution for Sev1/Sev2 situations; participate in on-call rotations and post-incident reviews.
- Maintain high-availability infrastructure and services across GCP, AWS, OCI, and on-prem environments.
- Use Infrastructure-as-Code (Terraform, Helm, Python/scripts) to deploy repeatable, version-controlled infrastructure and build automations to eliminate operational toil.
Requirements
- 4+ years in Site Reliability Engineering, DevOps, or related operational roles with proven Linux/Unix systems administration experience.
- Proficiency in scripting/programming for automation and tooling (Python, Bash, or Go).
- Strong experience with cloud infrastructure across GCP, AWS, and OCI, plus container orchestration with Kubernetes.
- Experience with monitoring and observability tools (Prometheus, Grafana, Splunk, Nagios).
- Hands-on Infrastructure-as-Code experience (Terraform, Ansible, or Helm).
- Ability to define and track SLIs, SLOs, and SLAs to drive reliability improvements.
Culture & Benefits
- Hybrid work: 3 days per week in office.
- Bonus/commission eligibility (as applicable), plus equity and benefits.
- Emphasis on automation, continuous improvement, and ownership for complex reliability challenges.
- Mentorship and knowledge sharing to elevate team capabilities.
- Use of AI-assisted development tools to accelerate automation and problem resolution.
Hiring process
- Interviews focused on reliability/operations experience, automation, and incident management.
- Discussion of technical approach to observability, SLI/SLOs, and Infrastructure-as-Code.
- Evaluation of communication and cross-team collaboration skills.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →