Назад
Company hidden
1 день назад

Software Engineer (AI Infra Visibility)

Формат работы
onsite
Тип работы
fulltime
Грейд
middle
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Software Engineer (AI Infra Visibility): Design, build, and scale backend systems for AI and GPU cluster observability with an accent on high-performance distributed systems that power telemetry ingestion, data processing, and APIs. Focus on detecting complex infrastructure issues that impact AI workloads.

Location: On Site, Palo Alto, California

Company

Clockwork Systems is pioneering a software-driven approach to AI fabrics by delivering cross-stack observability, workload fault tolerance, and performance acceleration.

What you will do

  • Design and build scalable backend systems for metric collection, processing, and analysis.
  • Develop robust methods to detect complex infrastructure issues that impact AI workloads.
  • Build large distributed systems running in production environments.
  • Collaborate across teams to deliver reliable, performant, and maintainable systems.

Requirements

  • 2+ years of industry experience building and operating production software systems.
  • Strong foundation in data structures, algorithms, and software design.
  • Fluency in one or more programming languages: C, C++, Go, Java, or Python.
  • Solid understanding of operating systems fundamentals (threads, scheduling, synchronization; kernel programming is a plus).
  • Experience with databases, including design, development, or scaling.
  • Excellent debugging, problem-solving, and communication skills.

Nice to have

  • Knowledge of networking protocols; familiarity with NIC architecture and operation.
  • Understanding of GPU or AI infrastructure (e.g., DCGM, PyTorch).
  • Familiarity with observability systems (metrics, logs, traces); experience with OpenTelemetry, Prometheus, or distributed tracing is a bonus.
  • Experience designing, building, and scaling large distributed systems.
  • Hands-on experience with service-oriented architectures and cloud platforms (AWS, GCP, Azure)

Culture & Benefits

  • Challenging projects.
  • A friendly and inclusive workplace culture.
  • Competitive compensation.
  • A great benefits package.
  • Catered lunch.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник - загрузка...