Назад
Company hidden
1 день назад

Principal Software Engineer (AI)

139 900 - 304 200$
Формат работы
onsite
Тип работы
fulltime
Грейд
principal
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Principal Software Engineer (AI): Advancing the core capabilities of hirify.global Advertising's ad-serving infrastructure, which powers advertising across Bing Search, MSN, hirify.global Start, and shopping experiences in the Edge browser with an accent on GPU/CPU inference, real-time bidding, and intelligent ranking pipelines. Focus on designing and optimizing high-performance serving systems and GPU inference frameworks that drive measurable latency improvements and cost efficiency.

Location: Must be located in Mountain View, United States

Salary: USD $139,900 – $304,200 per year.

Company

hirify.global is an equal opportunity employer.

What you will do

  • Design and lead the development of large-scale, distributed online serving systems, including GPU-accelerated and CPU-based ranking/inference pipelines.
  • Architect and optimize end-to-end inference infrastructure, including model serving, batching/streaming, caching, scheduling, and resource orchestration across heterogeneous hardware.
  • Profile and optimize performance across the full stack from CUDA kernels and GPU pipelines to CPU threads and OS-level scheduling.
  • Own live-site reliability as a DRI: design telemetry, alerting, and fault-tolerance mechanisms.
  • Collaborate and mentor across teams—driving architecture reviews, enforcing engineering excellence, and promoting system-level optimization practices.

Requirements

  • Bachelor’s Degree in Computer Science or related technical field AND 6+ years technical engineering experience developing high-performance, distributed systems in C++.
  • Industry experience in advertising or search engine backend systems, such as large-scale ad ranking, real-time bidding (RTB), or relevance-serving infrastructure.
  • Hands-on experience with real-time data streaming systems (Kafka, Flink, Spark Streaming), feature-store integration, and multi-region deployment for low-latency, globally distributed services.
  • Deep expertise in GPU inference frameworks such as NVIDIA Triton Inference Server, CUDA, and TensorRT, including hands-on experience implementing custom CUDA kernels.
  • Expertise in low-level system and OS internals, including multi-threading, process scheduling, NUMA-aware memory allocation, lock-free data structures, context switching, and I/O stack tuning.

Nice to have

  • Master’s Degree in Computer Science or related technical field AND 8+ years technical engineering experience developing high-performance, distributed systems in C++.
  • Familiarity with LLM inference optimization—model sharding, tensor/kv-cache parallelism, paged attention, continuous batching, quantization (AWQ/FP8), and hybrid CPU–GPU orchestration.
  • Strong understanding of model-serving trade-offs—batching vs. streaming, latency vs. throughput, quantization (FP16/BF16/INT8), dynamic batching, continuous model rollout, and adaptive inference scheduling across CPU/GPU tiers.

Culture & Benefits

  • The typical base pay range for this role across the U.S. is USD $139,900 – $274,800 per year.
  • The base pay range for this role in the San Francisco Bay area and New York City metropolitan area is USD $188,000 – $304,200 per year.
  • Certain roles may be eligible for benefits and other compensation.

Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →