Senior SRE Engineer (HPC & Cloud)

Формат работы

onsite

Тип работы

fulltime

Грейд

senior

Английский

Страна

Taiwan

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Senior SRE Engineer (HPC & Cloud): Managing large-scale Linux environments and HPC clusters with an accent on automation, cloud infrastructure, and storage optimization. Focus on building internal AI platforms, optimizing CI/CD pipelines, and ensuring the reliability of high-performance compute services.

Location: Taiwan

Company

hirify.global is a quantitative trading firm specializing in high-frequency trading and advanced research.

What you will do

Manage large-scale Linux environments, focusing on troubleshooting and deep root-cause analysis.
Develop maintainable automation using Bash, Ansible, and Python for infrastructure operations.
Operate HPC clusters (Slurm) and maintain high-performance storage solutions like Lustre and NAS.
Manage multi-cloud infrastructure across AWS, GCP, and Alibaba Cloud using Terraform and AWS CDK.
Build and operate Docker/Kubernetes (ECS, EKS) environments and design GitLab CI/CD pipelines.
Develop internal AI platforms, chatbots, and agents utilizing LangChain, Bedrock, and Elasticsearch RAG.

Requirements

5+ years of hands-on Linux systems administration and infrastructure operations experience.
Deep knowledge of Linux internals including process, memory, filesystem, networking, and cgroups.
Proficiency in Bash/Shell scripting and Python for data processing and API services.
Solid experience with RAID, filesystem selection, and shared storage operations (NFS/SMB).
Experience with public cloud providers (AWS/GCP/Alibaba) and IaC tooling (Terraform/Ansible).
Ability to drive complex technical subsystems end-to-end with strong autonomy and minimal supervision.

Nice to have

Experience with HPC schedulers (Slurm, PBS, LSF) or parallel filesystems (Lustre, GPFS).
Advanced Linux performance analysis skills using eBPF, perf, or ftrace.
Database operations experience with MySQL or ClickHouse.
GPU server operations including NVIDIA driver management, CUDA toolkit, and Slurm GRES configuration.
LLM application development experience with LangChain or RAG.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Senior SRE Engineer (HPC & Cloud)

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Nice to have

Разработка

Game Dev

Design и Creative

Аналитика

Менеджмент

People & Business

Senior SRE Engineer (HPC & Cloud)

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Nice to have

Categories