Эта вакансия в архиве

Посмотреть похожие вакансии ↓
Company hidden
обновлено 23 дня назад

Software Engineer (AI Infra Visibility)

Формат работы
onsite
Тип работы
fulltime
Грейд
middle
Английский
b2
Страна
US

Описание вакансии

Текст:
/

TL;DR

Software Engineer (AI Infra Visibility): Design, build, and scale backend systems for AI and GPU cluster observability with an accent on high-performance distributed systems that power telemetry ingestion, data processing, and APIs. Focus on detecting complex infrastructure issues that impact AI workloads.

Location: On Site, Palo Alto, California

Company

Clockwork Systems is pioneering a software-driven approach to AI fabrics by delivering cross-stack observability, workload fault tolerance, and performance acceleration.

What you will do

  • Design and build scalable backend systems for metric collection, processing, and analysis.
  • Develop robust methods to detect complex infrastructure issues that impact AI workloads.
  • Build large distributed systems running in production environments.
  • Collaborate across teams to deliver reliable, performant, and maintainable systems.

Requirements

  • 2+ years of industry experience building and operating production software systems.
  • Strong foundation in data structures, algorithms, and software design.
  • Fluency in one or more programming languages: C, C++, Go, Java, or Python.
  • Solid understanding of operating systems fundamentals (threads, scheduling, synchronization; kernel programming is a plus).
  • Experience with databases, including design, development, or scaling.
  • Excellent debugging, problem-solving, and communication skills.

Nice to have

  • Knowledge of networking protocols; familiarity with NIC architecture and operation.
  • Understanding of GPU or AI infrastructure (e.g., DCGM, PyTorch).
  • Familiarity with observability systems (metrics, logs, traces); experience with OpenTelemetry, Prometheus, or distributed tracing is a bonus.
  • Experience designing, building, and scaling large distributed systems.
  • Hands-on experience with service-oriented architectures and cloud platforms (AWS, GCP, Azure)

Culture & Benefits

  • Challenging projects.
  • A friendly and inclusive workplace culture.
  • Competitive compensation.
  • A great benefits package.
  • Catered lunch.