TL;DR
Senior Cloud and DevOps Engineer (AI/ML): Designing, optimizing, and securing scalable cloud platforms for AI model training and deployment with an accent on AWS infrastructure management, CI/CD pipelines, and cost control. Focus on troubleshooting incidents, ensuring zero-downtime deployments, and collaborating with AI Ops teams for platform performance and reliability.
Location: Hybrid work model based in Barcelona, Spain, with remote work possibilities.
Company
hirify.global, now part of EPAM Systems, is a Digital Accelerator with over 20 years of experience, fostering a multicultural, startup-minded culture.
What you will do
- Manage cloud infrastructure and optimize costs, particularly in AWS environments using Terraform and Python.
- Design, develop, and maintain CI/CD pipelines and infrastructure for AI model training and deployment.
- Ensure platform scalability, efficient resource utilization, high availability, and resilience in cloud architectures.
- Conduct technical research, analyze findings, and propose optimized architectural solutions.
- Troubleshoot incidents and guarantee smooth, zero-downtime deployments.
- Collaborate closely with AI Ops and technical teams to ensure platform performance, stability, and reliability.
Requirements
- Minimum of 7 years of professional experience in similar roles (Cloud Engineer, DevOps Engineer, MLOps Engineer, or related positions).
- Advanced hands-on experience with AWS infrastructure management, networking, security, and cost optimization.
- Strong expertise in Terraform for Infrastructure as Code (IaC).
- Solid experience using Python for automation and scripting.
- Proven experience designing and maintaining CI/CD pipelines in production environments.
- Experience managing infrastructure for machine learning model training and deployment.
- Advanced English level, both spoken and written.
Nice to have
- Experience with microservices-based architectures and containerization (Docker, Kubernetes).
- Knowledge of observability practices (monitoring, logging, alerting).
- Experience with cost governance and optimization strategies in complex cloud environments.
- Familiarity with MLOps practices and the end-to-end lifecycle of ML models in production.
- AWS or DevOps/MLOps-related certifications.
Culture & Benefits
- Permanent contract with a competitive salary.
- Flexible work model and remote work possibilities.
- Personalized career plan and continuous training (certifications, English).
- Participation in stable, high-technical-impact projects.
- Flexible working hours with a strong focus on work-life balance.
- Social benefits tailored to your needs.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →