Lead Systems Reliability Engineer (Linux & Distributed Systems)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Lead Systems Reliability Engineer (Linux & Distributed Systems): Building and maintaining a high-scale data-driven advertising platform with an accent on performance engineering, hardware optimization, and distributed systems reliability. Focus on tuning Linux kernels, benchmarking next-gen hardware, and designing automation for stateful systems at massive scale.
Location: London
Company
The world’s leading independent platform for digital advertising, helping brands reach audiences across the open internet.
What you will do
- Lead engineering teams to plan and manage global work streams, systems, and data structures across cloud and traditional datacenters.
- Design and improve infrastructure automation tailored for stateful systems at scale.
- Own the operations for Linux-based systems running Aerospike, Kafka, and MongoDB.
- Review new use cases and serve as a technical point of contact within an on-call rotation.
- Benchmark and analyze next-generation hardware offerings to optimize system throughput.
Requirements
- Deep expertise in the Linux operating system.
- Proven leadership experience and the ability to mentor other engineers.
- Advanced troubleshooting skills using the scientific method to isolate CPU and IO bottlenecks.
- Must be based in London
Nice to have
- Experience with physical on-prem hardware internals and operations.
- Background in performance testing and tuning.
- Knowledge of relational or NoSQL databases.
- Experience with Ansible, PyInfra, Chef, Prometheus, or Kubernetes.
- Proficiency in Python, Ruby, Rust, Bash, Golang, or C#.
Culture & Benefits
- Opportunity to work with bleeding-edge hardware, including nodes with 300TB NVMe and 512 cores.
- Direct collaboration with major vendors like AMD to run PoCs and optimize technology.
- Inclusive and diverse global team environment that encourages curiosity and critical thinking.
- Work on systems with massive scale, processing over 5MM QPS per node.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →