Staff Engineer, HPC Systems Software (AI)

100 000 - 500 000$

Формат работы

hybrid

Тип работы

fulltime

Грейд

senior

Английский

Страна

US/Canada

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Staff Engineer, HPC Systems Software (AI): Architecting and maintaining the operating system foundation for global hardware design infrastructure with an accent on bare-metal provisioning and configuration-as-code. Focus on scaling OS lifecycle management across hundreds of compute nodes and optimizing Linux kernel performance for AI hardware development.

Location: Hybrid: Must be based in Austin (TX), Santa Clara (CA), or Toronto (CA)

Salary: $100k - $500k

Company

hirify.global is a startup leading the industry in cutting-edge AI technology and high-performance RISC-V CPUs.

What you will do

Design and maintain automated OS deployment pipelines for global bare-metal HPC clusters.
Manage large-scale configuration using Ansible to ensure consistency across compute infrastructure.
Deploy and lifecycle manage RHEL and Ubuntu systems across diverse hardware platforms.
Implement infrastructure-as-code for repeatable, version-controlled system configurations.
Troubleshoot OS-level issues and optimize kernel parameters to resolve performance bottlenecks.
Collaborate with hardware design teams to standardize system configurations and development environments.

Requirements

Experience in RHEL and Ubuntu administration within HPC or large-scale compute environments.
High proficiency in Ansible for automation across hundreds of nodes.
Experience with bare-metal provisioning systems such as MAAS, Foreman, Cobbler, or Warewulf.
Deep understanding of Linux internals, networking, kernel tuning, and performance troubleshooting.
Familiarity with HPC cluster architecture and infrastructure-as-code practices.
Must be eligible to access U.S. export-controlled technology (EAR compliance).

Nice to have

Hands-on experience with IBM Spectrum LSF or similar HPC workload managers.
Integration with commercial HPC storage platforms like Pure Storage, Weka, or Vast Data.
Exposure to EDA tools and hardware design workflows in semiconductor development.
Experience with container technologies including Docker, Singularity, or Podman.
Cluster monitoring skills using Prometheus, Grafana, and custom tooling.
Python and bash scripting for production-level infrastructure automation.

Culture & Benefits

Highly competitive compensation package including base and variable targets.
Collaborative environment with a focus on curiosity and solving hard technical problems.
Opportunity to work on revolutionary AI platforms and RISC-V CPU architecture.
Equal opportunity employer.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →