Big Data/Data Platform Site Reliability Engineer (Hadoop/Kafka)

Формат работы

remote (только USA)

Тип работы

fulltime

Грейд

senior

Английский

Страна

Вакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Senior Site Reliability Engineer (Hadoop/Kafka): Deploying, configuring, and maintaining big data stores and large-scale Linux infrastructure with an accent on reliability, scalability, and operational excellence. Focus on debugging complex distributed system issues, advancing technology stack, and ensuring maximum uptime and predictable performance.

Location: Willing and able to work East Coast U.S. hours (9am–6pm EST)

Company

hirify.global is a fast-growing healthcare technology company using real-time data to transform healthcare through machine learning and programmatic automation.

What you will do

Deploy, configure, monitor, and maintain multiple big data stores with a strong focus on reliability and scalability.
Manage large-scale Linux infrastructure to ensure maximum uptime and predictable performance.
Develop and document system configuration standards, operational procedures, and best practices.
Perform performance and reliability testing, including reviewing configuration and hardware specifications.
Participate in incident response, root cause analysis, and drive long-term reliability improvements.
Advance the technology stack with innovative ideas and pragmatic solutions.

Requirements

Strong hands-on experience operating large-scale Linux infrastructure in production (Rocky Linux or equivalent).
Deep practical knowledge of Apache Hadoop-based data platforms (HDFS architecture, Kerberos, operational lifecycle).
Experience running Apache Kafka clusters in production, including KRaft-based setups.
Proven ability to debug complex distributed system issues across storage, compute, and networking layers.
Experience designing or improving automation, deployment, or GitOps-style workflows.
Proficiency in scripting or automation (Python, Shell).
Solid understanding of networking fundamentals (TCP/IP, DNS, load balancing, basic network security).
Willing and able to work East Coast U.S. hours (9am–6pm EST).

Nice to have

Experience with Trino and Iceberg as large-scale analytical query engines.
Experience administering Percona XtraDB Cluster or other HA databases.
Hands-on experience with Ceph or other distributed storage systems.
Strong background in observability platforms (Prometheus, Grafana, Graphite, ELK, Icinga).
Experience with configuration management (Puppet or similar).
Familiarity with Docker and Kubernetes in production environments.
Background in AdTech, real-time data systems, or low-latency/high-throughput environments.

Culture & Benefits

A collaborative environment where team success is valued.
Opportunities to continuously grow skills and learn new technologies.
Emphasis on strategic thinking and deep dives into complex systems.
A proactive approach to problem-solving and infrastructure reliability.

Hiring process

Initial Screening Call (30 mins).
Team Lead Screening (30 mins).
Technical Interview with SREs (1 hour).
General Discussion with VP of Data Engineering (30 mins).
Technical Interview with Principal Architect (1 hour).
Meet & Greet w/ SVP of Engineering (15 mins).
Final Video Call with Sr. Director of Data Management at WebMD (30 mins).

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник - загрузка...

Big Data/Data Platform Site Reliability Engineer (Hadoop/Kafka)

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Nice to have

Culture & Benefits

Hiring process

Похожие вакансии

Lead Systems Reliability Engineer (Linux & Distributed Systems)

Senior Infrastructure Engineer

DevOps Engineer (Fintech, Kubernetes)

Platform Engineer III (AWS/Kubernetes)

Site Reliability Engineer (Azure/AWS)

Staff Engineer Data Platform