Назад
Company hidden
обновлено 6 дней назад

Lead Staff Systems Reliability Engineer (Linux & Distributed Systems)

Формат работы
onsite
Тип работы
fulltime
Грейд
lead
Английский
b2
Страна
UK
Вакансия из списка Hirify.GlobalВакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Lead Staff Systems Reliability Engineer (Linux & Distributed Systems): Lead team to build and maintain data-driven platform using Aerospike, Kafka, MongoDB achieving p99 latency under 1ms with an accent on performance tuning, hardware optimization, and automation at global scale. Focus on benchmarking next-gen hardware, designing distributed systems automation, troubleshooting bottlenecks, and pushing infrastructure endurance limits.

Location: London

Company

hirify.global is a global technology company and the world’s leading independent platform for digital advertising across streaming TV, podcasts, mobile apps, news, and more.

What you will do

  • Lead team to influence, manage, and plan work streams, systems, and data structures at scale across global infrastructure providers.
  • Build and improve infrastructure automation for stateful systems at scale.
  • Own operations for Linux-based systems running Aerospike, Kafka, and MongoDB.
  • Serve as point of contact for new use cases, answer questions, and participate in on-call rotation.
  • Learn NoSQL expertise through training and become subject matter expert.
  • Benchmark and analyze next-generation hardware offerings.

Requirements

  • Strong Linux operating system skills.
  • Leadership experience and ability to mentor.
  • Advanced troubleshooting using isolation techniques, scientific method, and bottleneck identification (CPU, IO).
  • Empathetic, objective critical thinker who understands the 'why' behind objectives.
  • Open to diverse perspectives, creative problem-solving, and adaptability.

Nice to have

  • Physical hardware (on-prem) internals, management, and operation.
  • Performing testing and tuning.
  • Databases (relational or NoSQL).
  • Ansible/PyInfra/Chef, Prometheus, Kubernetes, Python/Ruby/Rust/Bash/Golang/C#.

Culture & Benefits

  • Work on cutting-edge challenges like 5MM QPS to NVMe in Aerospike and clusters with 300TB NVMe, 3TB RAM, 512 cores.
  • Collaborate with vendors like AMD and NoSQL providers on PoCs and optimizations.
  • Global team encouraging curiosity, diverse perspectives, learning, and innovation.
  • Impact worldwide through transparent, effective, responsible advertising technology.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →