TL;DR
Sr. HPC Systems Engineer: Administering and managing HPC clusters, storage systems, and high-speed networks to support hirify.global engineering applications with an accent on Linux-based compute cluster installation and integration. Focus on troubleshooting complex systems, automating reoccurring tasks, and ensuring reliability in a fast-paced environment.
Location: Hawthorne, CA
Salary: $160,000.00–$220,000.00/per year
Company
hirify.global develops advanced technologies for space exploration with the ultimate goal of enabling human life on Mars.
What you will do
- Administer and manage HPC clusters, storage systems, and high-speed networks.
- Provide application support to hirify.global employees across engineering disciplines.
- Install and integrate Linux-based compute clusters.
- Write instructional documentation and convey technical ideas in non-technical terms.
Requirements
- 5+ years of hands-on experience with client and server hardware/software, management tools, enterprise networking, virtualization, and security technologies.
- Bachelor's degree in computer science, engineering, math, or scientific discipline with 5+ years of systems engineering experience; OR 7+ years of professional experience building software in lieu of a degree.
- Experience with Linux.
- Must be a U.S. citizen or national, U.S. lawful permanent resident (aka green card holder), Refugee under 8 U.S.C. § 1157, or Asylee under 8 U.S.C. § 1158 due to ITAR requirements.
Nice to have
- 5+ years of professional experience building, deploying, and troubleshooting Linux systems.
- Experience with scripting languages (Bash, Python) to automate tasks.
- Experience building, deploying, and troubleshooting HPC clusters.
- Familiarity with cluster resource managers (Slurm, PBS, LSF).
- Experience with monitoring and alerting technologies (Prometheus, Grafana, Nagios).
- Familiarity with scientific and engineering computing (CFD, FEA).
- Familiarity with ML frameworks (PyTorch, Tensorflow), GPU usage, and Cuda.
- Experience with containers (Docker, Podman, Singularity) and automated configuration management (Puppet, Ansible).
Culture & Benefits
- Comprehensive medical, vision, and dental coverage.
- Access to a 401(k) retirement plan.
- Eligibility for long-term incentives (company stock, stock options) and potential discretionary bonuses.
- Paid parental leave, 3 weeks of paid vacation, and 10+ paid holidays per year.
- Work in a fast-paced, challenging environment with mission-critical and sensitive systems.
Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →