Principal Site Reliability Engineer
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Principal Site Reliability Engineer (AWS/Kubernetes/Spark): Architecting scalable cloud infrastructure, data pipelines, and developer platforms with an accent on reliability, observability, and SDLC optimization. Focus on designing self-service tools, integrating AIOps for automated remediation, and driving cross-team alignment to boost engineering velocity and system resilience.
Location: Hybrid (1-3 days/week in office) if local to Bellevue, WA or New York, NY; fully remote if located further away. Must be authorized to work in the United States on a full-time basis without sponsorship.
Salary: $163,620 - $212,710 USD annually (includes base salary, bonus, commission; plus equity).
Company
.tv changes how brands measure TV advertising impact using big data on AWS with Kubernetes clusters.
What you will do
- Architect and maintain scalable AWS infrastructure with Kubernetes, focusing on high availability and data pipeline reliability (Spark, EMR).
- Implement observability (SLIs/SLOs, monitoring, alerting) and automation (Terraform, incident response, AIOps with LLMs).
- Design self-service developer platforms, CI/CD pipelines (CircleCI, ArgoCD, Helm), and AI tools for code generation and efficiency.
- Lead cost optimization, mentorship, documentation, and KPIs for developer experience.
- Define SRE/DevEx roadmap, identify bottlenecks, and align infrastructure, security, and product teams.
Requirements
- Authorized to work in the US without sponsorship.
- Bachelor's in CS/Engineering or equivalent; 10+ years in software engineering/SRE/cloud, 3+ years leading.
- Deep AWS expertise (EKS, ECR, RDS, SQS/SNS, VPC, MWAA, S3); IaC (Terraform); Kubernetes/Helm/ArgoCD (5+ years).
- Spark optimization for large-scale data; CI/CD (CircleCI); scripting (Python/shell/JS); networking; monitoring (OTel, Splunk/DataDog).
- Experience with GenAI tools, vendor assessments, and Ad-Tech/big data preferred.
Nice to have
- Experience in Ad-Tech or big data processing organizations.
- Native AI observability tools.
Culture & Benefits
- Hybrid/flexible workplace: office 1-3 days/week if local, fully remote otherwise.
- Competitive compensation with equity, overtime for non-exempt, market-informed packages.
- Standard benefits; focus on engineering excellence, growth opportunities, and impactful projects in a startup environment.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →