Назад
Company hidden
1 день назад

Senior Software Engineer - AI Infra Visibility

Формат работы
onsite
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior Software Engineer (AI Infra Visibility): Design and build scalable backend systems for AI and GPU cluster observability with an accent on telemetry ingestion, data processing, and API development. Focus on architecting large distributed systems, detecting complex infrastructure issues, and ensuring performance and reliability for AI workloads.

Location: Onsite in Palo Alto, California

Company

hirify.global, founded by Stanford researchers and veteran systems engineers, is pioneering a software-driven approach to AI fabrics to increase GPU cluster utilization.

What you will do

  • Design and build scalable backend systems for metric collection, processing, and analysis.
  • Develop robust methods to detect complex infrastructure issues impacting AI workloads.
  • Build and operate large distributed systems running in production environments.
  • Collaborate across teams to deliver reliable, performant, and maintainable systems.

Requirements

  • 7+ years of industry experience building and operating production software systems.
  • Strong foundation in data structures, algorithms, and software design.
  • Fluency in one or more programming languages: C, C++, Go, Java, or Python.
  • Experience designing, building, and scaling large distributed systems.
  • Hands-on experience with service-oriented architectures and cloud platforms (AWS, GCP, Azure).
  • Solid understanding of operating systems fundamentals.
  • Experience with databases, including design, development, or scaling.
  • Excellent debugging, problem-solving, and communication skills.

Nice to have

  • Knowledge of networking protocols.
  • Understanding of GPU or AI infrastructure (e.g., DCGM, PyTorch).
  • Familiarity with observability systems (metrics, logs, traces); experience with OpenTelemetry, Prometheus, or distributed tracing.

Culture & Benefits

  • Challenging projects in a friendly and inclusive workplace culture.
  • Competitive compensation and a great benefits package.
  • Catered lunch.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник - загрузка...