Senior Software Engineer (Compute Architecture)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior Software Engineer (Go): Building and operating a software control plane for hardware lifecycle management across large-scale GPU data centers with an accent on distributed systems and hardware-aware automation. Focus on developing reliable APIs for BMCs, firmware state, and server health to ensure platform resilience at fleet scale.
Location: Hybrid (Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA). Remote may be considered for candidates located more than 30 miles from an office. Must be a U.S. person (citizen, national, lawful permanent resident/green card holder, refugee, or asylee) to comply with U.S. Government export regulations.
Salary: $182,000 – $242,000
Company
is a specialized cloud provider for AI, delivering high-performance infrastructure and tools to enable innovators to build and scale AI with confidence.
What you will do
- Design, build, and operate Go-based services that manage the lifecycle of large-scale GPU data center infrastructure.
- Build automation for data center bring-up, hardware discovery, health monitoring, and production operations.
- Develop reliable APIs, services, and workflows for managing BMCs, firmware state, and rack-level infrastructure.
- Improve observability, alerting, and operational tooling using Prometheus and Grafana.
- Translate hardware failure modes and incidents into software improvements to increase platform resilience.
- Partner with hardware, infrastructure, and operations teams to design systems that work safely at fleet scale.
Requirements
- 5+ years of experience building and operating infrastructure or backend systems.
- Strong proficiency in Go for building production services and tools.
- Experience designing and building gRPC and REST APIs.
- Experience with Kubernetes and containerized workloads in production environments.
- Familiarity with observability tooling such as Prometheus and Grafana.
- Must be a U.S. person or eligible for export controlled information access without required authorization.
Nice to have
- Experience working with GPU-based systems.
- Experience with low-level hardware management such as BMCs or Redfish.
- Experience operating large-scale distributed systems or high-throughput infrastructure.
- Contributions to open-source projects like Go or Redfish.
Culture & Benefits
- Medical, dental, and vision insurance (100% company-paid).
- 401(k) with generous employer match and Employee Stock Purchase Program (ESPP).
- Flexible PTO and paid parental leave.
- Daily catered lunch at office and data center locations.
- Comprehensive wellness support including Spring Health, Carrot, and Kinside.
- Tuition reimbursement and company-paid life insurance.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →