TL;DR
Senior Software Engineer (AI): Building and optimizing Python-based workflow automation systems for GPU node and network switch lifecycle management at scale, with an accent on device provisioning, burn-in testing, network configuration, and hardware health validation. Focus on designing foundational platform components, integrating with datacenter infrastructure, and driving technical strategy for reliability and operational excellence in distributed systems.
Location: Remote (Global)
Company
hirify.global is the GPU cloud engineered for AI, providing cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers.
What you will do
- Build Python-based workflow automation systems for GPU node and network switch lifecycle management at scale.
- Design foundational platform components with established software patterns.
- Implement device provisioning, burn-in testing, network configuration, and hardware health validation workflows.
- Integrate with datacenter infrastructure management systems, cloud orchestration platforms, and bare metal provisioning tools.
- Build distributed workflow orchestration systems to coordinate complex automation tasks.
- Drive technical strategy for reliability, observability, incident response, and operational excellence.
Requirements
- 5+ years of software engineering experience building and operating production systems, with a focus on infrastructure automation or workflow tooling.
- Strong proficiency in Python.
- Proven ability to build distributed systems at scale, with an emphasis on infrastructure reliability, scalability, and security.
- Technical expertise in quickly understanding systems design tradeoffs and rapidly evolving software systems.
- Experience delivering automation systems from ambiguous requirements to operational systems in production, including day 2 operations (monitoring, incident response, performance optimisation).
- Excellent communication skills to build consensus with stakeholders.
Nice to have
- Experience with workflow orchestration tools like Temporal, Airflow, or Prefect.
- Hands-on experience with infrastructure tooling like DCIMs, NetBox, OpenStack, or ERP systems.
- Experience with bare metal provisioning and automation (MAAS, Ironic, IPMI, PXE boot, network automation).
- Experience building hardware lifecycle automation.
- GPU infrastructure experience (health monitoring, burn-in testing, cluster management).
- Deep knowledge of Kubernetes, Infrastructure as Code (Terraform, Pulumi), AWS, and GCP.
Culture & Benefits
- Collaborative, supportive, and innovative remote-first environment where contributions have a real impact.
- Highly competitive compensation package (base + equity) with reviews every 12 months.
- Opportunity to join a fast-growing tech startup and contribute to cutting-edge AI technology.
- Dynamic progression plan tailored to individual ambitions, with full support for growth.
- Human-First Flexibility, trusting hirify.globalrs with autonomy to shape their day around life's moments.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →