Lead Staff Systems Reliability Engineer (Linux & Distributed Systems)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Lead Staff Systems Reliability Engineer (Linux & Distributed Systems): Lead team to build and maintain data-driven platform using Aerospike, Kafka, MongoDB achieving p99 latency under 1ms with an accent on performance tuning, hardware optimization, and automation at global scale. Focus on benchmarking next-gen hardware, designing distributed systems automation, troubleshooting bottlenecks, and pushing infrastructure endurance limits.
Location: London
Company
is a global technology company and the world’s leading independent platform for digital advertising across streaming TV, podcasts, mobile apps, news, and more.
What you will do
- Lead team to influence, manage, and plan work streams, systems, and data structures at scale across global infrastructure providers.
- Build and improve infrastructure automation for stateful systems at scale.
- Own operations for Linux-based systems running Aerospike, Kafka, and MongoDB.
- Serve as point of contact for new use cases, answer questions, and participate in on-call rotation.
- Learn NoSQL expertise through training and become subject matter expert.
- Benchmark and analyze next-generation hardware offerings.
Requirements
- Strong Linux operating system skills.
- Leadership experience and ability to mentor.
- Advanced troubleshooting using isolation techniques, scientific method, and bottleneck identification (CPU, IO).
- Empathetic, objective critical thinker who understands the 'why' behind objectives.
- Open to diverse perspectives, creative problem-solving, and adaptability.
Nice to have
- Physical hardware (on-prem) internals, management, and operation.
- Performing testing and tuning.
- Databases (relational or NoSQL).
- Ansible/PyInfra/Chef, Prometheus, Kubernetes, Python/Ruby/Rust/Bash/Golang/C#.
Culture & Benefits
- Work on cutting-edge challenges like 5MM QPS to NVMe in Aerospike and clusters with 300TB NVMe, 3TB RAM, 512 cores.
- Collaborate with vendors like AMD and NoSQL providers on PoCs and optimizations.
- Global team encouraging curiosity, diverse perspectives, learning, and innovation.
- Impact worldwide through transparent, effective, responsible advertising technology.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →