Staff Engineer, Datacenter Server Lifecycle (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Staff Engineer, Datacenter Server Lifecycle (Infrastructure/AI): Designing and owning the end-to-end operational journey of datacenter hardware from provisioning to decommissioning with an accent on automation, security, and scale. Focus on building trusted compute standards, automating fleet management for tens of thousands of servers, and ensuring hardware integrity.
Location: Hybrid (San Francisco, CA or New York City, NY). Must be in one of the offices at least 25% of the time
Salary: $320,000 - $405,000 USD
Company
is a public benefit corporation dedicated to creating reliable, interpretable, and steerable AI systems that are safe and beneficial for society.
What you will do
- Lead the build-out of automation to support datacenters containing tens of thousands of servers.
- Define and own the end-to-end server lifecycle strategy, including provisioning, deployment, maintenance, and decommissioning.
- Partner with the Infrastructure Security team to design and enforce trusted compute standards across the server lifecycle.
- Collaborate with the Networking team to ensure end-to-end connectivity across all sites.
- Build and maintain tooling to track machine health, configuration, and operational status across the full fleet.
Requirements
- Hands-on experience with server hardware, including rack deployment, cabling, and troubleshooting at scale.
- Deep understanding of hardware lifecycle management, asset tracking, and provisioning workflows.
- Proficiency in at least one programming language such as Python, Rust, Go, or Java.
- Working knowledge of Kubernetes, Infrastructure as Code, AWS, and GCP.
- Willingness to travel occasionally to datacenter sites across North America.
- Bachelor's degree or equivalent professional experience.
Nice to have
- 8+ years of experience in datacenter operations or hardware infrastructure management.
- Hands-on experience with GPU or AI accelerator hardware (e.g., NVIDIA A100/H100, AMD MI300, TPUs).
- Familiarity with provisioning tooling like coreboot, LinuxBoot, or u-root.
- Experience with trusted compute and hardware security concepts (Secure Boot, TPM, hardware attestation).
- Background in large-scale capacity planning at a hyperscaler or cloud provider.
Culture & Benefits
- Competitive compensation and optional equity donation matching.
- Generous vacation and parental leave.
- Flexible working hours and collaborative office spaces.
- Visa sponsorship available for eligible candidates.
- Collaborative, research-driven environment focused on high-impact AI safety.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →