Site Reliability Engineer (AWS, GCP, Kubernetes)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Site Reliability Engineer (AWS, GCP, Kubernetes): Ensuring platform resilience, scalability, and performance across multi-cloud environments with an accent on observability, automation, and first-principles engineering. Focus on building reliability guardrails, managing EKS/GKE clusters, and implementing SLOs/SLIs to prevent system failures.
Location: Hybrid schedule based in South Charlotte, NC or New York, NY (Tuesday through Thursday). No visa sponsorship or corp-to-corp available.
Salary: $100,000 - $145,000 per year
Company
A global portfolio of high-growth companies delivering seamless digital experiences for consumers and Fortune 100 clients.
What you will do
- Ensure system reliability and performance across multi-cloud, multi-region platforms using first principles thinking.
- Build and maintain observability solutions using OpenTelemetry, New Relic, Grafana, and Prometheus.
- Automate infrastructure provisioning and deployments using Terraform.
- Manage and optimize Kubernetes clusters (EKS, GKE) focusing on security and operational excellence.
- Lead incident response efforts, conduct root cause analysis, and implement preventive measures.
- Partner with engineers to embed reliability best practices into architecture and delivery pipelines.
Requirements
- 3–5 years of experience in SRE, DevOps, or cloud infrastructure engineering.
- Strong hands-on experience with AWS, GCP, and Kubernetes (EKS, GKE).
- Proficiency with Terraform and scripting languages (Python, Bash, Go).
- Experience building observability systems and managing CI/CD pipelines (Harness, Jenkins, GitLab CI).
- Proven track record of maintaining high-availability systems (99.9%+ uptime).
- Must be authorized to work in the US without visa sponsorship (no H1-B, F-1, OPT, STEM-OPT, or TN).
Nice to have
- Cloud certifications (AWS Solutions Architect, GCP Professional Cloud Architect).
- Experience with data platform infrastructure (Databricks, Snowflake).
- Familiarity with security scanning tools (Wiz, Aqua, Prisma Cloud) or compliance frameworks (SOC 2, PCI-DSS, HIPAA).
- Experience with chaos engineering, resilience testing, or database performance tuning.
Culture & Benefits
- Focus on strategic engineering and first-principles thinking rather than firefighting.
- Comprehensive health insurance coverage (medical, dental, and vision).
- 401(k) plan with company match.
- Flexible Paid Time Off (PTO) starting at 20 days annually.
- Paid Parental Bonding Benefit Program.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →