TL;DR
Senior AI Infrastructure Engineer (AI): You will own the infrastructure that brings AI models to life in production, optimizing LLM inference, deploying real-time voice AI agents, and scaling GPU clusters. Focus on inference optimization, real-time video processing, model serving at scale, and GPU workload orchestration.
Location: Europe. Must be based in Portugal or Germany.
Salary: €66,500 - €104,500 a year
Company
hirify.global is shifting healthcare from human-first to AI-first through its AI Care platform, making world-class healthcare available anytime, anywhere.
What you will do
- Design, build, and maintain the inference infrastructure that powers hirify.global's AI products, ensuring high throughput, low latency, and cost efficiency.
- Own the end-to-end deployment pipeline for AI models - from real-time computer vision to large language models.
- Architect and scale Kubernetes clusters for GPU-accelerated workloads, including autoscaling strategies and resource scheduling.
- Build and operate the infrastructure behind hirify.global's real-time AI agents, including WebRTC cluster provisioning.
- Drive inference scaling strategies and evaluate emerging AI infrastructure tools to keep hirify.global at the cutting edge.
- Collaborate with ML Engineers, Data Scientists, and Product teams to translate model requirements into robust, production-ready infrastructure.
Requirements
- 5+ years of experience in infrastructure engineering, with at least 2 years focused on AI/ML workloads in production environments.
- Strong experience with Kubernetes for orchestrating GPU-accelerated workloads.
- Hands-on experience with model serving and inference optimization frameworks for both real-time computer vision and large language model workloads.
- Solid understanding of LLM inference optimization techniques.
- Experience with Infrastructure as Code (Terraform or similar) and GitOps methodologies for managing complex, GPU-enabled environments.
- Fluent in English (written and oral).
Nice to have
- Experience with LLM serving engines such as vLLM, SGLang, or LLM-D.
- Experience with NVIDIA Triton Inference Server and TensorRT for real-time computer vision workloads.
- Experience with Istio or similar service mesh.
- Experience provisioning infrastructure on AWS, Azure, or GCP.
Culture & Benefits
- A stimulating, fast-paced environment with lots of room for creativity.
- Career development and growth, with a competitive salary.
- A flexible environment where you can control your hours (remotely) with unlimited vacation.
- Remote or Hybrid work policy.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →