TL;DR
Senior Software Engineer – Real-Time Workflows & ML Serving (AI): Building and optimizing real-time data pipelines and ML serving systems for modern ads platforms with an accent on streaming ETL, feature computation, low-latency serving, and system reliability. Focus on designing and implementing robust streaming systems, owning SLOs, and optimizing end-to-end performance and cost for massive scale ML models.
Location: Bengaluru, India
Company
hirify.global builds and operates large-scale, latency-sensitive systems that serve billions of requests, powering modern ads platforms.
What you will do
- Design and implement real-time streaming ETL/feature pipelines (Flink or Spark Structured Streaming) that meet strict freshness and correctness constraints.
- Build and operate reliable messaging and ingestion systems with Kafka/Pulsar.
- Own data contracts between producers, pipelines, and consumers, managing schema evolution and validation.
- Define and meet SLOs using OpenTelemetry/Prometheus/Grafana for metrics, tracing, and alerting.
- Integrate pipelines with online stores/caches and ML consumers (feature stores, embedding pipelines, LLM API calls).
- Optimize end-to-end performance and efficiency (CPU/memory/I/O, serialization, caching, concurrency) and contribute to serving/inference integrations (Triton/ONNX Runtime/TensorRT).
Requirements
- Bachelor’s or Master’s degree in Computer Science or related field, with 6+ years of related experience.
- Strong programming skills in language C++, C# or Python.
- Hands-on experience in one or more: building and operating streaming data pipelines in production (Flink or Spark Structured Streaming), distributed systems engineering with strong reliability, or messaging systems such as Kafka/Pulsar.
- Experience operating services with Kubernetes/containers and production readiness practices.
- Experience with observability stacks such as OpenTelemetry, Prometheus, Grafana.
Nice to have
- Experience with feature stores, embedding pipelines, and online/offline consistency.
- Experience with data lakehouse/table formats and optimizations (partitioning, compaction, incremental processing).
- Experience with GPU inference serving (Triton, ONNX Runtime/TensorRT) and performance techniques.
- Background in cost/performance modeling, capacity planning, and reliability improvements for high-scale data platforms.
- Experience in Ads/search/recommendations or other high-scale systems where freshness, latency, and cost are important.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →