Senior Site Reliability Engineer (SRE)

99 090 - 123 860$

Тип работы

fulltime

Грейд

senior

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Senior Site Reliability Engineer (SRE) (SRE/DevOps): Building and maintaining scalable infrastructure and automation for traditional services and AI-driven workloads with an accent on reliability, observability, and CI/CD for model deployment workflows. Focus on incident response, root-cause analysis, and partnering with AI/ML teams to support training, serving, and lifecycle management.

Location: Atlanta, GA

Salary: $99,090 - $123,860 USD (annual base)

Company

hirify.global provides financial services focused on helping customers and communities build a better financial future.

What you will do

Design, build, and maintain scalable infrastructure and automation tools for traditional and AI-based systems.
Develop software to improve reliability and reduce manual toil.
Implement and manage CI/CD pipelines, including model deployment workflows.
Monitor performance, availability, and security using modern observability tools.
Collaborate with data science and ML engineering teams to support AI/ML training, serving, and lifecycle management.
Lead incident response, root cause analysis, and postmortem processes; advocate SRE principles across engineering and AI teams.

Requirements

5+ years of experience in SRE, DevOps, or software engineering.
Strong programming skills (e.g., Python, Java).
Experience supporting AI/ML workloads (model training, inference, GPU orchestration).
Deep understanding of Linux systems, cloud platforms (primarily Azure and AWS), and container orchestration.
Experience with infrastructure-as-code tools (e.g., Terraform, Ansible, GitHub).
Proficiency with monitoring/logging tools (e.g., Dynatrace) and strong networking, security, and distributed systems knowledge.

Nice to have

Experience with AI model observability, drift detection, or performance monitoring.
Contributions to open-source SRE/DevOps/ML infrastructure tools.
Cloud platform certifications.

Culture & Benefits

Health, dental, vision, and life insurance plans.
401(k) savings plan with generous company matching (up to 6%).
Employer-paid retirement plan (cash balance retirement plan, 4%).
Tuition reimbursement up to $5,250/year.
Paid time off (20 days), paid company holidays, and a flexible Diversity Celebration Day.
Paid volunteer time (40 hours per calendar year).

Hiring process

Interviews to assess SRE/DevOps experience, reliability/observability practices, and experience with AI/ML infrastructure.
Discussion of collaboration approach and incident/operations leadership.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →