Назад
Company hidden
1 день назад

Senior Software Engineer, Observability Insights (AI)

165 000 - 242 000$
Формат работы
hybrid
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior Software Engineer, Observability Insights (AI): Building product experiences and agentic interfaces for a foundational telemetry layer with an accent on multi-tenant APIs, managed Grafana, and MCP-based tool servers. Focus on enabling understanding, troubleshooting, and optimization of complex AI systems, developing agentic observability capabilities for guided debugging and workload optimization.

Location: New York, NY / Sunnyvale, CA. While we prioritize a hybrid work environment, remote work may be considered for candidates located more than 30 miles from an office, based on role requirements for specialized skill sets. New hires will be invited to attend onboarding at one of our hubs within their first month. Teams also gather quarterly to support collaboration. Must be eligible to access export controlled information as a U.S. person (U.S. citizen, lawful permanent resident, refugee, or asylee) or eligible to obtain export authorization.

Salary: $165,000–$242,000

Company

hirify.global is The Essential Cloud for AI™, providing a platform of technology, tools, and teams to build and scale AI with confidence.

What you will do

  • Design and develop highly available, multi-tenant APIs that expose telemetry and derived insights.
  • Modernize user interaction with data by building agentic experiences, including MCP servers and API gateways.
  • Build agentic observability capabilities for guided debugging, workload optimization, and incident summarization.
  • Develop and enforce best practices regarding the health of telemetry data pipelines, focusing on correlation and aggregation.
  • Improve the performance, security, reliability, and scalability of insights services, including on-call rotation.
  • Collaborate with internal engineering teams to embed observability best practices and custom tooling.

Requirements

  • Six or more years of experience in software or infrastructure engineering, with a focus on building production-grade backend systems and distributed APIs.
  • Versed in reliability engineering concepts, including evaluation datasets for LLMs, error budgets, and fault-tolerant design.
  • Familiar with various observability systems like ClickHouse, Loki, Victoria Metrics, Prometheus, and Grafana.
  • Experienced in building agentic applications or LLM features, with a pragmatic approach to grounding and tool calling.
  • Comfortable with using Go as your primary programming language, and collaborating with Python components.
  • Must be eligible to access export controlled information as a U.S. person (U.S. citizen, lawful permanent resident, refugee, or asylee) or eligible to obtain export authorization.

Nice to have

  • Operated Kubernetes clusters at scale with experience of debugging real-world AI workloads.
  • Experience with logging, tracing, and metrics platforms in production and at scale.
  • Experienced running distributed systems/APIs services at cloud-scale, including event streaming or data pipeline management.
  • Experience in building services/products with LLMs, MCP and Agentic frameworks like Langchain, AgentCore.

Culture & Benefits

  • Hybrid work environment, with remote work considered for candidates located more than 30 miles from an office.
  • Competitive base salary range with a discretionary bonus and equity awards.
  • Comprehensive benefits program including medical, dental, and vision insurance (100% paid for by hirify.global).
  • Company-paid life insurance, short and long-term disability insurance, FSA, HSA, and tuition reimbursement.
  • Ability to participate in Employee Stock Purchase Program (ESPP), mental wellness benefits, and family-forming support.
  • Paid Parental Leave, flexible, full-service childcare support with Kinside, and 401(k) with a generous employer match.
  • Flexible PTO, catered lunch each day in our office and data center locations, and a casual work environment.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник - загрузка...