Назад
Company hidden
5 часов назад

Senior Site Reliability Engineer (SRE)

99 090 - 123 860$
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior Site Reliability Engineer (SRE) (SRE/DevOps): Building and maintaining scalable infrastructure and automation for traditional services and AI-driven workloads with an accent on reliability, observability, and CI/CD for model deployment workflows. Focus on incident response, root-cause analysis, and partnering with AI/ML teams to support training, serving, and lifecycle management.

Location: Atlanta, GA

Salary: $99,090 - $123,860 USD (annual base)

Company

hirify.global provides financial services focused on helping customers and communities build a better financial future.

What you will do

  • Design, build, and maintain scalable infrastructure and automation tools for traditional and AI-based systems.
  • Develop software to improve reliability and reduce manual toil.
  • Implement and manage CI/CD pipelines, including model deployment workflows.
  • Monitor performance, availability, and security using modern observability tools.
  • Collaborate with data science and ML engineering teams to support AI/ML training, serving, and lifecycle management.
  • Lead incident response, root cause analysis, and postmortem processes; advocate SRE principles across engineering and AI teams.

Requirements

  • 5+ years of experience in SRE, DevOps, or software engineering.
  • Strong programming skills (e.g., Python, Java).
  • Experience supporting AI/ML workloads (model training, inference, GPU orchestration).
  • Deep understanding of Linux systems, cloud platforms (primarily Azure and AWS), and container orchestration.
  • Experience with infrastructure-as-code tools (e.g., Terraform, Ansible, GitHub).
  • Proficiency with monitoring/logging tools (e.g., Dynatrace) and strong networking, security, and distributed systems knowledge.

Nice to have

  • Experience with AI model observability, drift detection, or performance monitoring.
  • Contributions to open-source SRE/DevOps/ML infrastructure tools.
  • Cloud platform certifications.

Culture & Benefits

  • Health, dental, vision, and life insurance plans.
  • 401(k) savings plan with generous company matching (up to 6%).
  • Employer-paid retirement plan (cash balance retirement plan, 4%).
  • Tuition reimbursement up to $5,250/year.
  • Paid time off (20 days), paid company holidays, and a flexible Diversity Celebration Day.
  • Paid volunteer time (40 hours per calendar year).

Hiring process

  • Interviews to assess SRE/DevOps experience, reliability/observability practices, and experience with AI/ML infrastructure.
  • Discussion of collaboration approach and incident/operations leadership.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →