Назад
Company hidden
4 часа назад

Staff Software Engineer (AI Infrastructure)

325 000 - 485 000GBP
Формат работы
hybrid
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
UK
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Staff Software Engineer (AI Infrastructure): Building and scaling compute infrastructure for AI models with an accent on node lifecycle management, automated hardware repair, and cluster orchestration. Focus on optimizing accelerator capacity (GPU/TPU), designing high-availability distributed systems, and scaling infrastructure to hundreds of thousands of hosts.

Location: Hybrid in London, UK (must be in office at least 25% of the time)

Salary: £325,000 - £485,000 GBP

Company

hirify.global is a public benefit corporation dedicated to creating reliable, interpretable, and steerable AI systems for the benefit of society.

What you will do

  • Own the technical strategy and roadmap for node lifecycle management, including ingestion, bring-up, health checking, and automated repair.
  • Drive cross-team initiatives to scale AI clusters across multiple clouds and accelerator families.
  • Design and operate systems that automatically detect and remediate unhealthy hardware to minimize capacity loss.
  • Define high-level infrastructure architecture and solve complex technical challenges directly or through other engineers.
  • Collaborate with cloud providers and internal research/product teams to shape long-term compute and data strategy.
  • Provide technical mentorship and coaching to support the growth of other engineers.

Requirements

  • Deep expertise in distributed systems, reliability, and cloud platforms (Kubernetes, IaC, AWS/GCP/Azure).
  • Strong proficiency in Rust, Go, or Python and expertise with Terraform.
  • Hands-on experience with machine learning accelerators such as GPUs, TPUs, or Trainium.
  • Track record of leading complex, multi-quarter technical initiatives spanning multiple teams.
  • Must be based in or able to work from the London office at least 25% of the time.
  • Bachelor’s degree or equivalent professional experience in a relevant field.

Nice to have

  • Experience managing hyperscale compute infrastructure (10K+ nodes).
  • Deep knowledge of Kubernetes internals (scheduler, autoscaler, kubelet, Karpenter) or orchestration systems like Borg or Mesos.
  • Low-level systems experience with kernel, virtualization, device drivers, or firmware.
  • Familiarity with high-performance networking (EFA, RDMA, InfiniBand) for distributed ML workloads.
  • Contributions to relevant open-source projects (e.g., Kubernetes, Linux kernel).

Culture & Benefits

  • Competitive compensation package including optional equity donation matching.
  • Generous vacation and parental leave.
  • Flexible working hours and a collaborative office environment.
  • Strong focus on AI safety and commitment to team diversity and representation.
  • Visa sponsorship available for qualified candidates.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →