AI Infrastructure & Reliability Engineer (AWS)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
AI Infrastructure & Reliability Engineer (AWS/K8s): Building and operating the internal foundation for the AI platform with an accent on core cloud infrastructure, CI/CD pipelines, and SRE practices. Focus on designing automated IaC repositories, implementing AI-driven deployment workflows, and ensuring high reliability of LLM operations at scale.
Location: Hybrid (Israel)
Company
helps modern, mid-size businesses transform the way they manage people through an intuitive, data-driven HR platform.
What you will do
- Design and build Terraform repositories and IaC pipelines from scratch with integrated drift detection and policy enforcement.
- Develop AI-driven GitHub Actions pipelines for automated code review and intelligent deployment decisions.
- Manage Kubernetes workloads across AWS accounts ensuring zero downtime and automated scaling.
- Define and enforce SLOs/SLIs, error budgets, and lead incident response and postmortem culture.
- Operate LLM APIs in production, managing rate limits, token quotas, and resilience patterns like circuit breakers.
- Implement FinOps practices to optimize and provide visibility into AWS, LLM, and observability spending.
Requirements
- 2-4 years of hands-on experience in DevOps, SRE, or infrastructure engineering within production SaaS environments.
- Strong AWS expertise, including multi-account architecture, IAM, Lambda, SQS, SNS, and EKS.
- Proven production experience with Kubernetes and stateful workload management.
- Proficiency with Terraform module architecture and GitHub Actions for end-to-end CI/CD.
- Working Python proficiency for scripting, internal tooling, and workflow automation.
- Practical experience building observability stacks (metrics, logging, distributed tracing) from scratch.
Nice to have
- Experience operating LLM APIs in production, including latency monitoring and cost attribution.
- FinOps experience across cloud and AI infrastructure.
- Experience introducing self-healing or auto-remediation patterns in production.
Culture & Benefits
- Flexible hybrid working model with a work-from-home allowance.
- Competitive compensation including a company share options plan (pre-IPO equity).
- Wellness perks: Annual Headspace subscription and monthly Wolt allowance.
- Health and support: Payment for sick leave from day one and transportation allowance.
- Work-life balance: 4 additional "Bob balance days" per year and 2 Social Impact days for volunteering.
- Dog-friendly office and temporary remote work from anywhere (up to 2 months after 6 months of employment).
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →