Эта вакансия в архиве
Посмотреть похожие вакансии ↓обновлено 1 месяц назад
DevOps Engineer (HPC)
Описание вакансии
Текст:
TL;DR
DevOps Engineer (HPC): Building and scaling a scalable Kubernetes-based platform for large-scale AI and HPC workloads with an accent on infrastructure reliability, automation, and security. Focus on designing fault-tolerant systems, driving infrastructure innovation, and ensuring high availability in a fast-paced startup environment.
Location: Must be based in or willing to relocate to France or the UK, or remote from specified European countries with mandatory periodic office visits in Paris.
Company
develops high-performance, open-source AI models and infrastructure, aiming to democratize AI for enterprise and cloud environments.
What you will do
- Design, build, and operate a scalable Kubernetes platform for AI and HPC workloads ensuring performance and security.
- Manage full lifecycle of cluster operations including automation, monitoring, and orchestration.
- Drive infrastructure innovation through tooling, CI/CD pipelines, and observability improvements.
- Implement zero-trust security models including IAM and network access controls.
- Develop user-centric features to simplify operations for sysadmins and customers.
- Lead incident resolution with root-cause analysis to improve system resilience.
Requirements
- Experience in infrastructure engineering roles such as DevOps, SRE, or platform engineering.
- Proficiency in software development, preferably Golang, and Kubernetes internals.
- Hands-on experience with containerization, orchestration tools, and infrastructure-as-code (Terraform, CloudFormation).
- Knowledge of monitoring and observability tools like Prometheus, Grafana, ELK, Datadog.
- Experience with highly available distributed systems and reliability KPIs.
- Location: Must be based in or willing to relocate to France or the UK, or remote from specified European countries with mandatory office visits.
- Excellent problem-solving and communication skills.
Nice to have
- Experience with HPC workload managers (Slurm) and distributed storage systems (Lustre, Ceph).
- Contributions to open-source projects.
Culture & Benefits
- Competitive salary and equity.
- Health insurance and private pension plan.
- Transportation, sport, and meal allowances.
- Generous parental leave policy.
- Visa sponsorship available.