Lead Site Reliability Engineer (AWS/Kubernetes)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Lead Site Reliability Engineer (SRE): Guide architecture, reliability, and operational excellence for infrastructure powering secure mission-critical collaboration platform with an accent on scalability, observability, performance, and automation across cloud and hybrid environments. Focus on designing containerized workloads, establishing monitoring frameworks, driving incident management, ensuring regulatory compliance, and mentoring SRE team members.
United States. Remote-first. For candidates residing in the U.S.: must be U.S. citizens eligible to obtain and maintain government security clearance. Must meet U.S. export control laws (EAR, ITAR).
Posting Range $170,000 - $200,000 USD
Company
Leading collaborative workflow platform for defense, intelligence, security, and critical infrastructure, trusted by U.S. Department of War and Fortune 500s, running on-premises and in private clouds.
What you will do
- Define strategy, architecture, and roadmap for SRE function, aligning with product and business goals.
- Design, deploy, and optimize containerized workloads, infrastructure-as-code, and compliant cloud environments (e.g., FedRAMP, DoD).
- Establish observability, monitoring, alerting frameworks for performance, reliability, and capacity planning.
- Drive incident management, on-call rotations, root cause analysis, and reliability improvements.
- Partner with security/compliance for data sovereignty and regulatory requirements; manage cloud costs and capacity.
- Build developer platform for secure software delivery; mentor and coach SRE team.
Requirements
- BS in Computer Science, Cybersecurity, Software Engineering or equivalent + 5+ years in SRE, DevOps, or cloud infrastructure.
- Expertise in container orchestration (Kubernetes), infrastructure-as-code (Terraform), cloud platforms (AWS).
- Experience designing monitoring, alerting, performance optimization; troubleshooting distributed systems.
- Proficiency in scripting/programming for automation; excellent communication and cross-functional influence.
- Experience leading globally distributed teams in remote-first environment. U.S. applicants: U.S. citizens eligible for security clearance; meet export control requirements (EAR/ITAR).
Nice to have
- Familiarity with Grafana, Prometheus; high-availability/disaster recovery architectures.
- Exposure to GCP, Azure; leadership in regulated industries (defense, finance).
- U.S. federal compliance (FedRAMP, DoD ATO, NIST); AWS Marketplace experience.
- Open-source contributions; certifications (CKA, CKAD, AWS Solutions Architect).
Culture & Benefits
- Remote-first, open-source company with globally distributed teams.
- Market-based pay based on skills, experience, location, and market conditions.
- EEO employer committed to diversity; accommodations for interviews.
- Expanding hiring to more countries while ensuring local compliance.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →