Head Of Infrastructure Support (AI)
ΠΡΡΡ & Π‘ΠΎΠΏΡΠΎΠ²ΠΎΠ΄
ΠΠ»Ρ ΠΌΡΡΡΠ° Ρ ΡΡΠΎΠΉ Π²Π°ΠΊΠ°Π½ΡΠΈΠ΅ΠΉ Π½ΡΠΆΠ΅Π½ Plus
ΠΠΏΠΈΡΠ°Π½ΠΈΠ΅ Π²Π°ΠΊΠ°Π½ΡΠΈΠΈ
TL;DR
Head of Infrastructure Support (AI): Leading the US support team to ensure high-performance infrastructure reliability for AI-focused customers with an accent on operational excellence, team mentorship, and incident management. Focus on bridging strategic direction with frontline execution, optimizing Kubernetes and Linux-based environments, and driving continuous improvement in service delivery.
Location: Must be based in the US
Company
is a GPU cloud provider engineered for AI, delivering high-performance infrastructure to support rapid innovation and strategic business outcomes for AI startups and enterprises.
What you will do
- Manage day-to-day operations and people leadership for the US Infrastructure Support team.
- Oversee ticket queue management, ensuring SLA adherence and timely resolution of complex incidents.
- Collaborate with Senior Engineers on technical improvements, operational tooling, and high-impact troubleshooting.
- Drive continuous improvement by refining runbooks, dashboards, and automation workflows.
- Set team objectives, manage shift planning, and conduct performance reviews to foster professional growth.
- Ensure compliance with ITIL processes and maintain high standards for security and operational documentation.
Requirements
- Must be based in the US
- Proven experience leading or managing engineers in an operational support environment.
- Strong Linux systems engineering experience and troubleshooting skills in production.
- Experience operating and debugging Kubernetes environments and distributed systems.
- Solid understanding of networking fundamentals (L2/L3, routing, load balancing) and datacenter technologies.
- Proficiency in scripting (Bash, Python) and Infrastructure as Code tools (Ansible, Terraform).
- Understanding of ITIL processes and SRE practices.
Nice to have
- Experience with GPU platforms (NVIDIA/AMD) and performance diagnostics.
- Exposure to HPC or distributed workloads like RDMA and InfiniBand.
- Experience with CI/CD or GitOps tooling.
- Experience working in multi-region environments.
Culture & Benefits
- Competitive compensation package including base salary and equity with annual reviews.
- Opportunity to work at a fast-growing tech startup in the cutting-edge AI infrastructure space.
- Human-first flexibility with a remote-first culture that trusts employees to manage their own time.
- Dynamic progression plan tailored to individual ambitions and career growth.
- Collaborative and innovative environment focused on transparency and ownership.
ΠΡΠ΄ΡΡΠ΅ ΠΎΡΡΠΎΡΠΎΠΆΠ½Ρ: Π΅ΡΠ»ΠΈ ΡΠ°Π±ΠΎΡΠΎΠ΄Π°ΡΠ΅Π»Ρ ΠΏΡΠΎΡΠΈΡ Π²ΠΎΠΉΡΠΈ Π² ΠΈΡ ΡΠΈΡΡΠ΅ΠΌΡ, ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΡ iCloud/Google, ΠΏΡΠΈΡΠ»Π°ΡΡ ΠΊΠΎΠ΄/ΠΏΠ°ΡΠΎΠ»Ρ, Π·Π°ΠΏΡΡΡΠΈΡΡ ΠΊΠΎΠ΄/ΠΠ, Π½Π΅ Π΄Π΅Π»Π°ΠΉΡΠ΅ ΡΡΠΎΠ³ΠΎ - ΡΡΠΎ ΠΌΠΎΡΠ΅Π½Π½ΠΈΠΊΠΈ. ΠΠ±ΡΠ·Π°ΡΠ΅Π»ΡΠ½ΠΎ ΠΆΠΌΠΈΡΠ΅ "ΠΠΎΠΆΠ°Π»ΠΎΠ²Π°ΡΡΡΡ" ΠΈΠ»ΠΈ ΠΏΠΈΡΠΈΡΠ΅ Π² ΠΏΠΎΠ΄Π΄Π΅ΡΠΆΠΊΡ. ΠΠΎΠ΄ΡΠΎΠ±Π½Π΅Π΅ Π² Π³Π°ΠΉΠ΄Π΅ β