Performance Engineer (AI)

Формат работы

onsite

Тип работы

fulltime

Грейд

senior

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Performance Engineer (AI): Building and maintaining automated performance testing frameworks for LLM inference, microservices, and infrastructure with an accent on VoIP quality characterization and system latency. Focus on designing high-scale load testing infrastructure, tuning database performance, and establishing SLIs/SLOs to ensure system reliability in healthcare.

Location: Palo Alto

Company

hirify.global is a generative AI company specializing in safety-focused LLMs designed for autonomous clinical conversations in healthcare.

What you will do

Design and maintain automated performance testing frameworks for LLM inference, REST/gRPC microservices, and infrastructure (PostgreSQL, Redis, message queues).
Integrate performance suites into CI/CD to gate deployments against latency and throughput regressions.
Measure and track VoIP quality metrics (MOS, jitter, packet loss, echo) and build synthetic call load testing infrastructure.
Define SLIs/SLOs and develop real-time visibility dashboards using Grafana.
Partner with ML, Speech, Backend, and Infra teams to translate performance findings into prioritized engineering work.
Contribute to incident reviews and author runbooks to help other engineers instrument their services.

Requirements

10+ years in performance engineering with a strong software development and SRE background.
Proven experience building automated performance test harnesses using tools like Locust, k6, Gatling, or JMeter.
Deep expertise in PostgreSQL performance tuning and Redis optimization.
Strong grasp of distributed systems fundamentals, including queueing theory, tail latency, and backpressure.
Fluency with observability tools such as Prometheus, Grafana, and Cloudwatch.
Working knowledge of SIP, RTP/RTCP and experience with VoIP testing tools like SIPp.

Nice to have

Experience benchmarking ML inference servers (vLLM, TensorRT-LLM, Triton).
Kubernetes workload profiling and resource right-sizing.
Chaos engineering experience using Toxiproxy, Gremlin, or Chaos Monkey.
Background in healthcare tech or high-reliability, latency-sensitive real-time communications.

Culture & Benefits

Opportunity to work on category-creating technology that transforms patient outcomes at a global scale.
Collaboration with a world-class team of AI pioneers and researchers from Stanford, Google, Meta, and NVIDIA.
Strong financial backing from leading investors including a16z, CapitalG, and General Catalyst.
High-impact role with ownership over performance across the entire technical stack.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →