Sr. Software Engineer-AI Reliability (AI)

150 000 - 210 000$

Формат работы

remote (только USA)

Тип работы

fulltime

Грейд

senior

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Sr. Software Engineer-AI Reliability (AI): Enhance the reliability, performance, and scalability of production AI systems with an accent on distributed services across application, database, and container orchestration layers. Focus on refactoring for resilience, diagnosing issues in data pipelines, implementing monitoring and alerting tools, and productionizing predictive systems at scale.

Location: Remote (US), ability to travel to Santa Barbara, CA office a few times per year

Salary: $150,000-$210,000

Company

Leading provider of AI-powered cybersecurity solutions using patented context-aware AI for real-time threat detection across cloud, hybrid, and on-premises environments.

What you will do

Own reliability, performance, and operational health of production AI services
Refactor and harden systems to improve resilience, clarity, and maintainability
Diagnose and resolve issues across distributed services, data pipelines, and storage
Design and implement monitoring, alerting, and debugging tools for high-availability
Partner with ML researchers and engineers to productionize systems at scale
Establish best practices for testing, deployment, capacity planning, and incident response
Contribute to incident response and postmortems for continuous improvement

Requirements

Ability to travel to our office in Santa Barbara, CA, a few times per year
7+ years of professional software engineering experience
Strong proficiency in Python and at least one JVM language (Java, Scala, Kotlin)
Proven experience designing, building, and operating distributed systems in production
Strong understanding of service architecture, concurrency, resource management, and distributed failure modes
Experience operating Kubernetes deployments
Strong experience with relational databases, query performance, indexing, and connection management
Demonstrated ability to diagnose performance, scalability, and reliability issues
Experience with automated testing and production observability (logging, metrics, tracing)
Experience collaborating with ML or data science teams

Culture & Benefits

Remote-First Work Culture
Healthcare (Medical, Dental, Vision, Accident)
Basic & Voluntary Life and AD&D
Flexible Spending Account (FSA)
401(k) with Employer Match
Paid Holidays & Flexible Paid Time Off (PTO)

Hiring process

Conversations about production systems you’ve owned, improved, and operated
Live refactoring and testing exercise in Java, Kotlin, or Scala
Distributed systems discussion on performance, state, failure modes, and debugging
ML production discussion on stabilizing model-driven systems
Final in-person conversation at Santa Barbara office on ownership and leadership

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →