Infrastructure Support Engineer
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Infrastructure Support Engineer (GPU Cloud): Ensuring efficiency, reliability, and scalability of data centre infrastructure with an accent on monitoring, troubleshooting, and customer support. Focus on handling tickets, following runbooks, collaborating with engineering teams, and identifying automation opportunities.
Location: UK, with availability to travel to or customer locations for deployments, troubleshooting, and operational tasks.
Company
GPU cloud engineered for AI, providing cost-effective, high-performance infrastructure for AI startups and enterprises.
What you will do
- Join support duty rotation to handle tickets, alerts, and incidents, escalating appropriately and collaborating with engineering.
- Manage and resolve tickets using the ticketing system, keeping all parties informed with clear notes and communications.
- Follow runbooks for common issues, propose improvements, and contribute fixes; participate in monitoring, triage, and log capture.
- Deliver tasks and projects to timelines, flag blockers early, and share knowledge through documentation and training materials.
- Participate in incident reviews, identify automation opportunities, and collaborate with cross-functional teams including onsite operations.
- Engage in on-call/out-of-hours work when scheduled and constantly upskill.
Requirements
- Growth mindset: curious, dependable, collaborative, seeking feedback and investing in learning.
- Platform/DC fundamentals: servers, networks, storage, virtualization from support/operations background.
- Linux fundamentals: CLI, systemd, filesystems, permissions, basic networking tools, troubleshooting.
- Networking basics: IP addressing, subnets, VLANs, routing, DNS, firewalls.
- Kubernetes exposure: core concepts, basic troubleshooting, runbooks.
- GPU awareness: basic diagnostics like nvidia-smi; observability: dashboards, alerts.
- Scripting/automation: Bash/Python snippets, Git; cloud/virtualization basics.
Nice to have
- Hands-on Kubernetes administration, operators, storage/networking add-ons.
- Deeper GPU/HPC: RDMA/InfiniBand, distributed workloads, NCCL.
- Infrastructure as Code: Ansible, Terraform; GitOps, CI/CD.
- Access/security tools like Teleport or Vault; relevant certifications.
Culture & Benefits
- Collaborative, supportive, innovative environment with real impact in a fast-growing AI tech startup.
- Highly competitive package (base + equity) with reviews every 12 months.
- Dynamic progression plan tailored to ambitions, with autonomy and flexibility.
- Human-first flexibility: shape your day around life's moments, relentless innovation, ownership, accountability.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →