AI Infrastructure Engineer
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
AI Infrastructure Engineer (DevOps/SRE): Building and operating production infrastructure for autonomous AI agents with an accent on observability, controllability, and multi-cloud reliability. Focus on designing infrastructure patterns for non-deterministic systems and hardening IaC stacks across AWS, GCP, and Azure.
Location: New York City
Salary: $180k – $300k
Company
transforms critical institutions in healthcare, manufacturing, and energy by deploying frontier applied AI technology.
What you will do
- Define infrastructure patterns for multi-agent AI systems to ensure they are observable, controllable, and recoverable.
- Own and evolve the IaC stack using Terraform and Kubernetes across AWS, GCP, and Azure.
- Build observability primitives to trace agent decisions and execution paths.
- Design and maintain CI/CD pipelines to provide fast and trustworthy feedback loops.
- Establish operational foundations for monitoring, alerting, and incident response for autonomous systems.
- Meet reliability and compliance requirements for regulated environments, including SOC 2 and HIPAA.
Requirements
- 5+ years of experience building and operating production infrastructure in DevOps or SRE roles.
- Strong hands-on experience with Terraform.
- Deep experience with at least one major cloud provider (AWS, GCP, or Azure), covering networking, IAM, and cost management.
- Solid production experience with Docker and Kubernetes.
- Experience designing and maintaining CI/CD pipelines (GitHub Actions, GitLab CI, or similar).
- Scripting proficiency in Python, Bash, or similar languages.
Nice to have
- Multi-region and multi-cloud experience across two or more providers.
- Experience with single-tenant or on-prem deployments alongside multi-tenant SaaS.
- Familiarity with GitOps patterns, progressive delivery, and the Grafana stack (Prometheus, Loki).
- Knowledge of compliance frameworks like HIPAA and SOC 2.
- Background supporting the transition of ML or research workflows into production.
Culture & Benefits
- Equity offerings in a fast-growing AI company.
- High-agency environment with real ownership over foundational technical decisions.
- Culture of "Intensity with Kindness," emphasizing candor, excellence, and vulnerability.
- Outcome-oriented work environment without strict 9-5 hour monitoring.
- Opportunity to define new SRE and infrastructure standards for autonomous AI agents.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →