Software Engineer, System Enablement (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Software Engineer, System Enablement (AI): Responsible for the architectural and engineering backbone of ’s infrastructure with an accent on system software, networking, platform architecture, fleet-level monitoring, and performance optimization. Focus on end-to-end bring-up and bootstrap path for new systems and compute nodes from bare metal/early access in lab or production/cloud environments to schedulable fleet capacity.
Location: San Francisco or Seattle, USA
Salary: $293K – $455K + Offers Equity
Company
is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity.
What you will do
- Own the end-to-end bring-up and bootstrap path for new systems and compute nodes from bare metal/early access in lab or production/cloud environments to schedulable fleet capacity.
- Build and maintain first-class golden image + provisioning workflows across lab and production environments, including working with partner-provided base images and reconciling OS/version requirements.
- Integrate nodes into our fleet infrastructure and IaC pipelines (Terraform, Chef, etc.), ensuring cloud resources map cleanly onto our internal lifecycle expectations.
- Ensure new hardware is reachable and scheduled, including cases where new SKUs require changes for scheduling integration.
- Drive registration and inventory correctness, including hands-on support to get nodes registered and visible end-to-end.
- Collaborate with partner teams to implement baseline health + telemetry bring-up.
Requirements
- BS in CS/EE (or equivalent practical experience).
- 5+ years of experience in systems SW development and building/operating Linux-based infrastructure in production or pre-production environments.
- Strong, hands-on experience with Kubernetes cluster operations.
- Experience with Infrastructure-as-Code / config management (Terraform, Chef/Ansible, etc.).
- Experience with Provisioning and imaging (PXE/iPXE, golden images, cloud-init/user-data).
- Networking fundamentals (L2/L3, routing, DNS, firewalling; comfort debugging reachability).
- Proven ability to write automation in Python/Go/Bash and ship operational tooling/runbooks.
Nice to have
- Experience bringing up new hardware platforms (early silicon/servers/NICs) in a lab setting and turning them into stable fleet capacity.
- Multi-cloud operational experience (Azure/GCP/AWS/OCI), especially with compute pools (e.g., VMSS / instance pools).
- Experience building telemetry/health pipelines (agent-based metrics/logging, health rollups, readiness criteria).
- Familiarity with WAN, peering, and multi-site network concepts for cluster deployments.
Culture & Benefits
- We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products.
- AI is an extremely powerful tool that must be created with safety and human needs at its core.
- We must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →