Site Reliability Engineer (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Site Reliability Engineer (Kubernetes/AI): Managing and deploying containerized applications across GKE, EKS, and on-premises environments with an accent on high availability and infrastructure stability. Focus on improving monitoring stacks, implementing InfoSec configurations, and performing root-cause analysis to ensure seamless aircraft turnaround software operations.
Location: Remote
Company
provides AI-based computer-vision software solutions to help global airports and airlines improve on-time performance, efficiency, and safety.
What you will do
- Deploy and manage containerized applications across GKE, EKS, and on-premises Kubernetes environments.
- Maintain complex infrastructure to ensure high availability and optimal performance via proactive administration.
- Enhance and maintain monitoring and alerting stacks using Prometheus, Grafana, and Alertmanager.
- Participate in on-call rotations and perform root-cause analysis (RCA) to resolve production issues.
- Implement and manage InfoSec configurations, including firewall management and backup strategies.
- Collaborate with Project Managers and cross-functional teams to align infrastructure with project requirements.
Requirements
- 3+ years of experience as an SRE, IT Operations Engineer, or Systems Administrator.
- Strong Linux system administration and deep understanding of networking concepts.
- Hands-on experience managing Kubernetes clusters.
- Proficiency with monitoring stacks including Prometheus, Grafana, and Alertmanager.
- Automation mindset with scripting skills in Python, Bash, or similar.
- Excellent verbal and written English skills.
Nice to have
- Experience with on-premises Kubernetes installations, microservice architecture, and Helm.
- Experience with Jsonnet for deployment configurations.
- Understanding of Machine Learning, Computer Vision, or AI-based video analytics.
- Familiarity with QA practices and CI/CD workflows.
- Relevant certifications such as CKA/CKAD, AWS, or Google Cloud.
Culture & Benefits
- Always remote work with a flexible schedule.
- Paid vacation and sick leaves.
- Budget for relevant courses and online education.
- Company culture based on honesty and mutual respect.
- Live team events.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →