Назад
Company hidden
13 дней назад

Senior Solutions Engineer (AI Infrastructure)

Формат работы
remote (только USA)
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior Solutions Engineer (AI Infrastructure): Building customer solution architectures for large-scale AI, HPC, analytics, and data-intensive workloads with an accent on GPU clusters, high-performance storage and networking, Kubernetes platforms, and distributed training/inference environments. Focus on technical discovery, PoC planning and execution, and translating complex infrastructure requirements into deployment guidance that drives production success.

Location: Remote (United States)

Company

hirify.global provides infrastructure solutions for large-scale AI and data-intensive workloads.

What you will do

  • Lead technical discovery with customers across infrastructure, platform, ML, data, and executive stakeholders.
  • Design architectures for large-scale AI, HPC, analytics, and enterprise data workloads.
  • Evaluate infrastructure tradeoffs involving GPUs, storage, networking, orchestration, and data movement.
  • Design and execute proofs of concept to validate performance, scale, reliability, and business value.
  • Debug customer issues across Linux, storage, networking, Kubernetes, schedulers, GPUs, and application workloads.
  • Create technical assets (demos, runbooks, field guidance) and support production deployment planning.

Requirements

  • 8 to 12+ years of technical experience with significant hands-on infrastructure experience.
  • Experience building, operating, or architecting production platform infrastructure.
  • Strong understanding of Linux internals and distributed systems (including Paxos and Raft), plus storage and networking implementation details.
  • Experience with one or more: GPU infrastructure, large-scale HPC, Kubernetes platforms, MLOps, storage systems, cloud infrastructure, or data platforms.
  • Ability to communicate credibly with engineers, architects, technical executives, and business stakeholders.
  • Strong discovery, problem-solving, and systems debugging skills in ambiguous, fast-moving environments.

Nice to have

  • Experience with large-scale GPU clusters, distributed training/inference, and AI platforms.
  • Experience with petabyte-scale storage and high-performance data systems.
  • Experience with orchestration/scheduling tools such as Kubernetes, Slurm, Ray, or Spark.
  • Domain expertise with systems like Lustre, Ceph, Weka, BeeGFS, GPFS, VAST, object storage, or distributed filesystems.
  • Experience with InfiniBand/RoCE/RDMA and high-performance Ethernet, plus NVIDIA/Mellanox networking.
  • Hands-on experience with CUDA/NCCL/DCGM/GPUDirect, checkpointing, dataset staging, or model-serving infrastructure.

Culture & Benefits

  • Customer-facing technical role focused on deep infrastructure problem solving and clear solution design.
  • Work end-to-end from discovery and evaluation through deployment planning and production success.
  • Operate without a rigid playbook in fast-moving, ambiguous environments.
  • Partner with product and engineering to feed field feedback into the roadmap.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →