Назад
Company hidden
обновлено 3 дня назад

Software Engineer Reliability (AI)

255 000 - 490 000$
Формат работы
onsite
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Релокация
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Software Engineer Reliability (AI): Ensuring the reliability, scalability, and performance of AI infrastructure systems with an accent on fault-tolerant design, automation, and observability. Focus on building resilient systems, load and chaos testing, and collaborating cross-functionally to support millions of users reliably.

Location: San Francisco HQ, onsite only with relocation assistance

Salary: $255K – $490K + Equity

Company

hirify.global is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity by safely deploying advanced AI technologies.

What you will do

  • Design and implement scalable infrastructure solutions to meet growing demands.
  • Build and maintain load, chaos, and synthetic testing tools to improve system reliability.
  • Develop automation tools to streamline tasks and enhance reliability.
  • Manage CPU/storage, GPU, and network lifecycle platforms for resource optimization.
  • Implement fault-tolerant and resilient design patterns to minimize disruptions.
  • Develop and maintain SLOs and SLIs to measure system reliability.
  • Collaborate with cross-functional teams to deliver new features and research capabilities.
  • Participate in on-call rotations to ensure 24/7 system availability.

Requirements

  • Must be based in or relocate to San Francisco, onsite only.
  • Bachelor's degree in Computer Science or equivalent experience.
  • Proven experience in reliability engineering or similar roles in fast-paced environments.
  • Strong proficiency with cloud infrastructure, containerization (Kubernetes), and IaC tools (Terraform, CloudFormation).
  • Experience with observability tools like DataDog, Prometheus, Grafana, and Splunk.
  • Excellent problem-solving, communication, and collaboration skills.

Nice to have

  • Experience with microservices architecture and service mesh technologies.
  • Knowledge of security best practices in cloud environments.

Culture & Benefits

  • Relocation assistance provided for new employees.
  • Equal opportunity employer with commitment to diversity and inclusion.
  • Fast-paced, collaborative, and mission-driven work environment.
  • Focus on safety and responsible AI deployment.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →