Назад
Company hidden
2 месяца назад

Performance Engineer (AI)

Формат работы
onsite
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Performance Engineer (AI): Building and maintaining automated performance testing frameworks for LLM inference, microservices, and infrastructure with an accent on VoIP quality characterization and system latency. Focus on designing high-scale load testing infrastructure, tuning database performance, and establishing SLIs/SLOs to ensure system reliability in healthcare.

Location: Palo Alto

Company

hirify.global is a generative AI company specializing in safety-focused LLMs designed for autonomous clinical conversations in healthcare.

What you will do

  • Design and maintain automated performance testing frameworks for LLM inference, REST/gRPC microservices, and infrastructure (PostgreSQL, Redis, message queues).
  • Integrate performance suites into CI/CD to gate deployments against latency and throughput regressions.
  • Measure and track VoIP quality metrics (MOS, jitter, packet loss, echo) and build synthetic call load testing infrastructure.
  • Define SLIs/SLOs and develop real-time visibility dashboards using Grafana.
  • Partner with ML, Speech, Backend, and Infra teams to translate performance findings into prioritized engineering work.
  • Contribute to incident reviews and author runbooks to help other engineers instrument their services.

Requirements

  • 10+ years in performance engineering with a strong software development and SRE background.
  • Proven experience building automated performance test harnesses using tools like Locust, k6, Gatling, or JMeter.
  • Deep expertise in PostgreSQL performance tuning and Redis optimization.
  • Strong grasp of distributed systems fundamentals, including queueing theory, tail latency, and backpressure.
  • Fluency with observability tools such as Prometheus, Grafana, and Cloudwatch.
  • Working knowledge of SIP, RTP/RTCP and experience with VoIP testing tools like SIPp.

Nice to have

  • Experience benchmarking ML inference servers (vLLM, TensorRT-LLM, Triton).
  • Kubernetes workload profiling and resource right-sizing.
  • Chaos engineering experience using Toxiproxy, Gremlin, or Chaos Monkey.
  • Background in healthcare tech or high-reliability, latency-sensitive real-time communications.

Culture & Benefits

  • Opportunity to work on category-creating technology that transforms patient outcomes at a global scale.
  • Collaboration with a world-class team of AI pioneers and researchers from Stanford, Google, Meta, and NVIDIA.
  • Strong financial backing from leading investors including a16z, CapitalG, and General Catalyst.
  • High-impact role with ownership over performance across the entire technical stack.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →