Назад
Company hidden
4 дня назад

AI Infrastructure & Reliability Engineer (AWS)

Формат работы
hybrid
Тип работы
fulltime
Грейд
middle
Английский
b2
Страна
Israel
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

AI Infrastructure & Reliability Engineer (AWS/K8s): Building and operating the internal foundation for the AI platform with an accent on core cloud infrastructure, CI/CD pipelines, and SRE practices. Focus on designing automated IaC repositories, implementing AI-driven deployment workflows, and ensuring high reliability of LLM operations at scale.

Location: Hybrid (Israel)

Company

hirify.global helps modern, mid-size businesses transform the way they manage people through an intuitive, data-driven HR platform.

What you will do

  • Design and build Terraform repositories and IaC pipelines from scratch with integrated drift detection and policy enforcement.
  • Develop AI-driven GitHub Actions pipelines for automated code review and intelligent deployment decisions.
  • Manage Kubernetes workloads across AWS accounts ensuring zero downtime and automated scaling.
  • Define and enforce SLOs/SLIs, error budgets, and lead incident response and postmortem culture.
  • Operate LLM APIs in production, managing rate limits, token quotas, and resilience patterns like circuit breakers.
  • Implement FinOps practices to optimize and provide visibility into AWS, LLM, and observability spending.

Requirements

  • 2-4 years of hands-on experience in DevOps, SRE, or infrastructure engineering within production SaaS environments.
  • Strong AWS expertise, including multi-account architecture, IAM, Lambda, SQS, SNS, and EKS.
  • Proven production experience with Kubernetes and stateful workload management.
  • Proficiency with Terraform module architecture and GitHub Actions for end-to-end CI/CD.
  • Working Python proficiency for scripting, internal tooling, and workflow automation.
  • Practical experience building observability stacks (metrics, logging, distributed tracing) from scratch.

Nice to have

  • Experience operating LLM APIs in production, including latency monitoring and cost attribution.
  • FinOps experience across cloud and AI infrastructure.
  • Experience introducing self-healing or auto-remediation patterns in production.

Culture & Benefits

  • Flexible hybrid working model with a work-from-home allowance.
  • Competitive compensation including a company share options plan (pre-IPO equity).
  • Wellness perks: Annual Headspace subscription and monthly Wolt allowance.
  • Health and support: Payment for sick leave from day one and transportation allowance.
  • Work-life balance: 4 additional "Bob balance days" per year and 2 Social Impact days for volunteering.
  • Dog-friendly office and temporary remote work from anywhere (up to 2 months after 6 months of employment).

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →