Software Engineer Reliability (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Software Engineer Reliability (AI): Ensuring the reliability, scalability, and performance of AI infrastructure systems with an accent on fault-tolerant design, automation, and observability. Focus on building resilient systems, load and chaos testing, and collaborating cross-functionally to support millions of users reliably.
Location: San Francisco HQ, onsite only with relocation assistance
Salary: $255K – $490K + Equity
Company
is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity by safely deploying advanced AI technologies.
What you will do
- Design and implement scalable infrastructure solutions to meet growing demands.
- Build and maintain load, chaos, and synthetic testing tools to improve system reliability.
- Develop automation tools to streamline tasks and enhance reliability.
- Manage CPU/storage, GPU, and network lifecycle platforms for resource optimization.
- Implement fault-tolerant and resilient design patterns to minimize disruptions.
- Develop and maintain SLOs and SLIs to measure system reliability.
- Collaborate with cross-functional teams to deliver new features and research capabilities.
- Participate in on-call rotations to ensure 24/7 system availability.
Requirements
- Must be based in or relocate to San Francisco, onsite only.
- Bachelor's degree in Computer Science or equivalent experience.
- Proven experience in reliability engineering or similar roles in fast-paced environments.
- Strong proficiency with cloud infrastructure, containerization (Kubernetes), and IaC tools (Terraform, CloudFormation).
- Experience with observability tools like DataDog, Prometheus, Grafana, and Splunk.
- Excellent problem-solving, communication, and collaboration skills.
Nice to have
- Experience with microservices architecture and service mesh technologies.
- Knowledge of security best practices in cloud environments.
Culture & Benefits
- Relocation assistance provided for new employees.
- Equal opportunity employer with commitment to diversity and inclusion.
- Fast-paced, collaborative, and mission-driven work environment.
- Focus on safety and responsible AI deployment.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →