TL;DR
Senior Systems Engineer (AI): Leading hands-on bringup of network clusters in data center environments with an accent on node, rack, and network deployment validation. Focus on tuning high-speed fabrics, debugging performance issues, and building repeatable, scalable infrastructure processes.
Location: Must be based in the US and comfortable working onsite in data center environments.
Company
hirify.global is a startup building next-generation AI infrastructure, focused on delivering performant and scalable network clusters for frontier AI workloads.
What you will do
- Execute end-to-end bringup of network nodes and racks from installation to production.
- Validate BIOS, BMC, firmware configurations, and overall network health.
- Bring up and validate high-speed network fabrics including InfiniBand, RoCE, and Ethernet.
- Configure leaf/spine connectivity and run cluster-wide burn-in and stress testing.
- Troubleshoot hardware, firmware, and fabric-level issues to optimize performance.
- Automate provisioning processes and improve deployment documentation.
Requirements
- 5–8+ years in infrastructure engineering, hardware deployment, or data center operations.
- Hands-on experience deploying network servers like HGX or DGX platforms.
- Deep understanding of high-speed networking fabrics (InfiniBand, RoCE, Ethernet).
- Strong Linux systems knowledge.
- Proven ability to troubleshoot distributed systems performance issues.
- Must be able to work onsite in data center environments as required.
Nice to have
- Experience in AI/ML infrastructure or HPC environments.
- Familiarity with NCCL, CUDA, and RDMA.
- Proficiency in automation tools like Python, Ansible, Terraform, or Bash.
- Experience managing high-density power and cooling environments.
Culture & Benefits
- Fast-paced startup environment with an emphasis on urgency and ownership.
- Opportunity to build core AI infrastructure from the ground up.
- Direct collaboration with networking, systems software, and data center teams.
- Focus on developing repeatable, high-scale systems and processes.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →