Назад
Company hidden
1 день назад

Sr. Software Engineer-AI Reliability (AI)

150 000 - 210 000$
Формат работы
remote (только USA)
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Sr. Software Engineer-AI Reliability (AI): Enhance the reliability, performance, and scalability of production AI systems with an accent on distributed services across application, database, and container orchestration layers. Focus on refactoring for resilience, diagnosing issues in data pipelines, implementing monitoring and alerting tools, and productionizing predictive systems at scale.

Location: Remote (US), ability to travel to Santa Barbara, CA office a few times per year

Salary: $150,000-$210,000

Company

Leading provider of AI-powered cybersecurity solutions using patented context-aware AI for real-time threat detection across cloud, hybrid, and on-premises environments.

What you will do

  • Own reliability, performance, and operational health of production AI services
  • Refactor and harden systems to improve resilience, clarity, and maintainability
  • Diagnose and resolve issues across distributed services, data pipelines, and storage
  • Design and implement monitoring, alerting, and debugging tools for high-availability
  • Partner with ML researchers and engineers to productionize systems at scale
  • Establish best practices for testing, deployment, capacity planning, and incident response
  • Contribute to incident response and postmortems for continuous improvement

Requirements

  • Ability to travel to our office in Santa Barbara, CA, a few times per year
  • 7+ years of professional software engineering experience
  • Strong proficiency in Python and at least one JVM language (Java, Scala, Kotlin)
  • Proven experience designing, building, and operating distributed systems in production
  • Strong understanding of service architecture, concurrency, resource management, and distributed failure modes
  • Experience operating Kubernetes deployments
  • Strong experience with relational databases, query performance, indexing, and connection management
  • Demonstrated ability to diagnose performance, scalability, and reliability issues
  • Experience with automated testing and production observability (logging, metrics, tracing)
  • Experience collaborating with ML or data science teams

Culture & Benefits

  • Remote-First Work Culture
  • Healthcare (Medical, Dental, Vision, Accident)
  • Basic & Voluntary Life and AD&D
  • Flexible Spending Account (FSA)
  • 401(k) with Employer Match
  • Paid Holidays & Flexible Paid Time Off (PTO)

Hiring process

  • Conversations about production systems you’ve owned, improved, and operated
  • Live refactoring and testing exercise in Java, Kotlin, or Scala
  • Distributed systems discussion on performance, state, failure modes, and debugging
  • ML production discussion on stabilizing model-driven systems
  • Final in-person conversation at Santa Barbara office on ownership and leadership

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →