Company hidden

3 дня назад

Site Reliability Engineer (AI)

Формат работы

remote (только Europe)/hybrid

Тип работы

fulltime

Грейд

senior

Английский

Страна

France/UK/Spain +4 еще

Релокация

France

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Site Reliability Engineer (AI): Designing and maintaining scalable, fault-tolerant infrastructure for a Cloud platform with an accent on bare metal systems and distributed reliability. Focus on building abstraction layers between science and infrastructure, optimizing CI/CD for large training runs, and ensuring high availability of AI services.

Location: Primarily based in Paris or London. Remote candidates must be based in France, UK, Germany, Belgium, Netherlands, Spain, or Italy and are required to visit the Paris office for onboarding and at least 3 days per month.

Company

hirify.global is a pioneering AI company focused on democratizing high-performance, open-source, and cutting-edge AI models and solutions for enterprise and individual use.

What you will do

Design, build, and maintain scalable, highly available, and fault-tolerant infrastructures.
Operate systems and troubleshoot production issues, including on-call responses and infrastructure scaling.
Implement monitoring, alerting, and incident response systems to minimize downtime.
Develop CI/CD workflows and containerization tools for customer-facing APIs and large-scale training runs.
Build a cloud platform abstraction layer between AI science, engineering, and infrastructure.
Collaborate with security teams to ensure infrastructure compliance and best security practices.

Requirements

Master’s degree in Computer Science, Engineering, or a related field.
5+ years of experience in a DevOps/SRE role.
Strong experience with bare metal infrastructure and highly available distributed systems.
Hands-on expertise with Docker, Kubernetes, and IaC tools like Terraform or CloudFormation.
Proficiency in Python, Go, and Bash.
Experience with observability tools such as Prometheus, Grafana, ELK Stack, or Datadog.

Nice to have

Experience in an AI/ML environment.
Knowledge of high-performance computing (HPC) systems and workload managers (Slurm).
Experience with AI-oriented solutions like Fluidstack, Coreweave, or Vast.

Culture & Benefits

Competitive salary and equity.
Comprehensive health insurance and private pension plan.
Transportation and sport allowances, plus meal vouchers.
Generous parental leave policy.
Visa sponsorship available.
Culture of rigor, audacity, and low-ego collaboration.

Hiring process

Introduction call (30 min) and Manager interview (30 min).
Technical interviews covering System Design (45 min) and a Deep Dive (60 min).
Culture-fit discussion (30 min) and reference checks.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Site Reliability Engineer (AI)

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Nice to have

Culture & Benefits

Hiring process

Похожие вакансии

Senior Site Reliability Engineer (AWS)

Site Reliability Engineer (Observability)

Senior DevSecOps Engineer (Cloud)

IT Integration Engineer (AI/AWS)

Site Reliability Engineer (Fintech)

Senior Platform Engineer (AI)

Разработка

Game Dev

Design и Creative

Аналитика

Менеджмент

People & Business

Site Reliability Engineer (AI)

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Nice to have

Culture & Benefits

Hiring process

Categories

Похожие вакансии

Senior Site Reliability Engineer (AWS)

Site Reliability Engineer (Observability)

Senior DevSecOps Engineer (Cloud)

IT Integration Engineer (AI/AWS)

Site Reliability Engineer (Fintech)

Senior Platform Engineer (AI)