Назад
Company hidden
4 дня назад

Senior Software Engineer (AI)

119 800 - 234 700$
Формат работы
onsite
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior Software Engineer (AI): Designing and optimizing high-performance serving systems and GPU inference frameworks for hirify.global's ad-serving infrastructure with an accent on measurable latency improvements and cost efficiency. Focus on architectural design for massive global scale, real-time bidding, and intelligent ranking pipelines, involving deep learning and LLM workloads.

Location: Redmond, United States

Salary: USD $119,800 – $234,700 per year (U.S. typical base pay range)

Company

hirify.global is a global technology company known for its software products, services, and hardware, with this role focusing on its advertising platform.

What you will do

  • Design and lead development of large-scale, distributed online serving systems for ad requests, including GPU-accelerated and CPU-based pipelines.
  • Architect and optimize end-to-end inference infrastructure, including model serving, batching/streaming, caching, scheduling, and resource orchestration.
  • Profile and optimize performance across the full stack, from CUDA kernels and GPU pipelines to CPU threads and OS-level scheduling.
  • Own live-site reliability, designing telemetry, alerting, and fault-tolerance mechanisms for globally distributed systems.
  • Collaborate and mentor across teams, driving architecture reviews and promoting system-level optimization practices.

Requirements

  • Bachelor’s Degree in Computer Science or related technical field.
  • 4+ years technical engineering experience developing high-performance, distributed systems in C++.
  • Deep expertise in GPU inference frameworks such as NVIDIA Triton Inference Server, CUDA, and TensorRT.
  • Strong understanding of model-serving trade-offs (batching vs. streaming, latency vs. throughput, quantization).
  • Expertise in low-level system and OS internals, including multi-threading, NUMA-aware memory allocation, and I/O stack tuning.
  • Proven ability to profile and optimize GPU and system workloads for deep learning and LLM architectures.

Nice to have

  • Master’s Degree in Computer Science or related technical field and 6+ years technical engineering experience in C++.
  • Hands-on experience with real-time data streaming systems (Kafka, Flink, Spark Streaming).
  • Familiarity with LLM inference optimization techniques (model sharding, paged attention).
  • Demonstrated success operating large-scale systems with SLA-based capacity forecasting and autoscaling.
  • Passion for performance engineering, observability, and deep systems debugging.

Culture & Benefits

  • hirify.global is an equal opportunity employer.
  • All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances.
  • Assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process.

Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →