Назад
Company hidden
4 дня назад

Principal Software Engineer (AI)

304 200$
Формат работы
onsite
Тип работы
fulltime
Грейд
principal
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Principal Software Engineer (AI): Develop and optimize large-scale, distributed ad-serving and inference platforms with an accent on GPU-accelerated systems, low-latency serving, and scalable model deployment. Focus on designing high-performance serving systems, profiling and tuning GPU/CPU workloads, and ensuring live-site reliability for global ad infrastructure.

Location: Redmond, United States (onsite)

Salary: $139,900–$304,200 per year depending on location

Company

hirify.global is a global technology corporation specializing in software, AI, and cloud services.

What you will do

  • Design and lead development of distributed online serving systems with GPU and CPU inference pipelines processing millions of ad requests per second.
  • Architect and optimize end-to-end inference infrastructure including model serving, batching, caching, and resource orchestration across heterogeneous hardware.
  • Profile and optimize performance from CUDA kernels to OS-level scheduling to improve latency and cost efficiency.
  • Own live-site reliability with telemetry, alerting, and fault-tolerance mechanisms for globally distributed systems.
  • Collaborate across teams, lead architecture reviews, and mentor engineers in performance engineering and debugging.

Requirements

  • Location: Must be based in or willing to work onsite in Redmond, United States
  • Bachelor’s degree in Computer Science or related field with 6+ years experience in high-performance distributed systems development in C++ or equivalent experience.
  • Experience with advertising or search engine backend systems, real-time data streaming, and multi-region deployment.
  • Expertise in GPU inference frameworks (NVIDIA Triton, CUDA, TensorRT) and low-level system internals including multi-threading and NUMA-aware memory allocation.
  • Strong skills in profiling, performance tuning, and operating large-scale systems with SLA-based capacity forecasting and autoscaling.
  • English proficiency: B2 or higher required

Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →