Назад
Company hidden
12 часов назад

Principal Software Engineer (AI)

Формат работы
remote (Global)
Тип работы
fulltime
Грейд
principal
Английский
b2
Страна
UK
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Principal Software Engineer (AI): Leading the technical development of the Fleet Manager platform, a workflow automation system for GPU node and network switch lifecycle management, with an accent on architecture, delivery, and establishing engineering standards. Focus on designing and building complex workflow orchestration systems and integrating with critical infrastructure tooling for AI compute.

Location: Fully remote, global.

Company

hirify.global is the GPU cloud engineered for AI, providing cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers.

What you will do

  • Lead technical architecture and roadmap for Fleet Manager's workflow automation systems.
  • Own end-to-end delivery of device provisioning, validation, testing, and remediation workflows at scale.
  • Design and build workflow orchestration systems for GPU node and network switch lifecycle management.
  • Establish engineering standards for reliability, observability, and operational excellence.
  • Mentor senior engineers and drive technical leadership through design reviews and hands-on collaboration.
  • Partner with Infrastructure, Platform, and SRE teams to translate operational needs into robust, scalable automation.

Requirements

  • 10+ years of software engineering experience building and operating production systems, with proven technical leadership in infrastructure automation or workflow tooling.
  • Strong Python engineering fundamentals with experience leading complex, multi-service distributed systems.
  • Driven by building distributed systems at scale, infrastructure reliability, scalability, security, and continuous improvement.
  • Track record of owning technical roadmaps and delivering large-scale automation systems.
  • Deep understanding of operational excellence: SLOs, monitoring, alerting, and incident response.
  • Excellent communication skills to build consensus with stakeholders.

Nice to have

  • Experience with workflow orchestration tools like Temporal, Airflow, or Prefect.
  • Hands-on experience with infrastructure tooling like DCIMs, NetBox, OpenStack, or ERP systems.
  • Experience with bare metal provisioning and automation (MAAS, Ironic, IPMI, PXE boot).
  • GPU infrastructure experience, including health monitoring, burn-in testing, or cluster management.
  • Deep knowledge of Kubernetes, Infrastructure as Code (Terraform, Pulumi), AWS, and GCP.
  • Experience with HPC and networking, including datacenter topology or high-performance interconnects.

Culture & Benefits

  • Collaborative, supportive, and innovative remote-first environment.
  • Highly competitive package (base + equity) with reviews every 12 months.
  • Opportunity to join a fast-growing tech startup pushing boundaries in cutting-edge AI.
  • Dynamic progression plan tailored to your ambitions.
  • Human-First Flexibility, allowing autonomy to shape your day around life's moments.
  • Committed to fostering an inclusive, diverse, and equitable workplace.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник - загрузка...