Kafka Expert

Тип работы

project

Английский

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Kafka Expert (Kafka/ZooKeeper): Troubleshooting and modernizing an older on-prem Kafka cluster for real-time market quote / HFT tick data with an accent on incident diagnosis, storage/disk saturation root causes, and operational hardening. Focus on building monitoring/alerting, runbooks, and a practical upgrade roadmap (including a path away from ZooKeeper) while improving resilience to minimize RTO/RPO.

Company

hirify.global provides Kafka troubleshooting and modernization support for production streaming environments.

What you will do

Rapidly triage incidents by validating broker health, controller/ZK health, partition leadership/ISR, replication, rebalances, and disk saturation scope.
Diagnose why disk utilization jumped from ~10% to near 100% and identify root causes behind missing leaders, topic access failures, and invalid partition behavior.
Assess cluster configuration and harden it by reviewing broker/topic settings, partition distribution, rack awareness (if any), and failover behavior; document failure domains and bottlenecks.
Uplift observability by proposing/implementing Kafka monitoring (broker + ZK + OS/disk) with dashboards and alerting for lag, under-replication, disk/controller events, latency, GC, and network.
Deliver operational enablement: produce findings + recommendations/roadmap and create runbooks for safe operations (restarts, partition reassignment, capacity checks, backups, upgrades, recovery).
Optionally execute remediations (storage rebalancing, retention tuning, leader imbalance fixes) and plan Kafka upgrades including KRaft/ZooKeeper removal and resilience improvements.

Requirements

Proven hands-on experience operating Kafka in production, including high-throughput clusters.
Strong troubleshooting experience with partition leadership issues (missing leaders), ISR shrinkage, under-replicated partitions, and safe broker recovery without destructive “sledgehammer” actions.
Experience with ZooKeeper-based Kafka clusters and operational best practices.
Linux competence for disk/IO analysis, filesystem saturation, process/resource analysis, and networking basics.
Ability to produce clear, actionable documentation: findings, recommendations, and runbooks.
Strong communication skills working with a mixed engineering + IT team unfamiliar with Kafka.

Nice to have

Experience with Kafka monitoring stacks (JMX metrics pipelines, Prometheus/Grafana, lag monitoring, alerting design).
Experience with GUI/admin tooling and governance practices (RBAC, auditing approach, safer topic/config workflows).
Experience planning Kafka upgrades/migrations, including evaluating KRaft readiness and risk.
Familiarity with market data/trading workloads and latency-sensitive pipelines.
Experience with VMware-based on-prem operations and capacity planning.

Culture & Benefits

Freelance engagement focused on pragmatic improvements to an older on-prem Kafka environment with limited observability.
Clear deliverables: incident diagnosis, findings + recommendations/roadmap, and operational runbooks.
Hands-on collaboration with engineering and IT to transfer Kafka troubleshooting and day-to-day operations knowledge.
Resilience goal to minimize RTO/RPO (target: as low as practical, possibly ~1 minute max data loss tolerance).

Hiring process

Review the current Kafka incident symptoms and cluster context, then align on triage scope and modernization priorities.
Deliver a findings report and recommendations/roadmap, followed by optional execution of selected remediations and upgrade planning.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Kafka Expert

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Nice to have

Culture & Benefits

Hiring process

Похожие вакансии

Kafka Engineer (Java)

Kafka System Engineer (Java)

Senior Software Engineer (Kafka)

Staff Engineer Distributed Systems & Infrastructure (Kafka, Cassandra, ClickHouse, Kubernetes)

Senior Data Engineer (Scala)

Middle Разработчик интеграционных решений (Kafka/WSO2)

Разработка

Game Dev

Design и Creative

Аналитика

Менеджмент

People & Business