Senior/Staff Site Reliability Engineer (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior/Staff Site Reliability Engineer (AI): Managing and operating Kubernetes infrastructure at scale with an accent on AI-driven automation, networking, and deployment reliability. Focus on building CI/CD pipelines, optimizing cluster performance, and driving system-wide reliability through infrastructure-as-code and advanced monitoring.
Location: Onsite in San Francisco, CA. Visa sponsorship and relocation to San Francisco provided.
Salary: $180,000–$250,000 plus equity.
Company
An innovative technology company building AI-driven infrastructure and development tools.
What you will do
- Own and operate Kubernetes infrastructure, including lifecycle management, upgrades, and networking.
- Build and maintain robust CI/CD pipelines and deployment infrastructure.
- Leverage AI to automate production issue resolution and enhance software reliability.
- Define SLOs and develop incident response processes.
- Manage load balancing, service mesh configurations, and anomaly detection systems.
Requirements
- 5+ years of experience in managing critical production systems and software development workflows.
- Strong production experience with Kubernetes at scale and infrastructure-as-code (Terraform, Ansible).
- Deep knowledge of Linux networking, container networking (CNI, VXLAN, BGP), and DNS.
- Proficiency in Python and either Go or Bash for automation.
- Experience with logging, monitoring, and alerting (Prometheus, Grafana, Datadog).
- Excellent communication skills and ability to drive technical decisions across teams.
Nice to have
- Experience with GPU and AI/ML workload management.
- Knowledge of kernel-based monitoring like eBPF and XDP.
- Experience with security tooling (co, SIEM) and distributed storage systems like Ceph.
Culture & Benefits
- Comprehensive health, dental, and vision insurance coverage.
- Equity package as part of compensation.
- Opportunities for professional learning and growth.
- Regular team events and offsites to build culture.
- Visa sponsorship and relocation assistance provided.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →