Member Of Technical Staff - Cloud Infrastructure (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Member Of Technical Staff - Cloud Infrastructure (Kubernetes/GPU): Designing, building, and operating secure, scalable infrastructure for critical US government AI projects with an accent on GPU hardware and Kubernetes clusters. Focus on automating bare metal and hybrid cloud architectures, ensuring federal compliance, and optimizing high-scale AI workloads.
Location: In-person role based in Palo Alto, CA or Washington, DC. Up to 50% travel required.
Salary: $180,000 - $440,000 USD
Company
xAI is focused on creating AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge.
What you will do
- Develop software to provision and manage infrastructure across on-premise, virtual machine, and classified cloud environments.
- Optimize the reliability, performance, and cost-effectiveness of infrastructure for large-scale AI workloads in secure settings.
- Collaborate with engineers to design tailored solutions meeting government-specific needs and compliance standards.
- Implement robust observability, monitoring, and security practices according to federal protocols.
- Manage storage infrastructure using IaC tools such as Pulumi, Terraform, or Ansible.
- Drive system reliability through incident management, postmortems, and definition of SLAs and SLOs.
Requirements
- Active Top Secret (TS) security clearance.
- 5+ years of experience as an Infrastructure Engineer or SRE, preferably in secure or government environments.
- Proficiency in managing storage infrastructure with IaC tools (Pulumi, Terraform, or Ansible).
- Deep understanding of the Kubernetes stack, including CNI, CRI, and CSI components.
- Demonstrated ability to improve system reliability through incident management and SLO definition.
- Must be based in Palo Alto, CA or Washington, DC.
Nice to have
- Experience with GPU hardware installation, driver setup, and debugging.
- Experience optimizing Kubernetes for large-scale deployments in classified federal settings.
- Familiarity with chaos engineering and capacity planning.
- Proficiency with Kyverno, ArgoCD, or Go programming.
- Security certifications such as CISSP.
Culture & Benefits
- Competitive base salary and equity.
- Comprehensive medical, vision, and dental coverage.
- Access to a 401(k) retirement plan.
- Short and long-term disability insurance and life insurance.
- Flat organizational structure emphasizing engineering excellence, curiosity, and hands-on contribution.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →