Staff Engineer Engineering Compute Infrastructure and Grid Operations (Semiconductor)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Staff Engineer Engineering Compute Infrastructure and Grid Operations (Semiconductor): Designing and operating large-scale compute infrastructure for chip design and verification with an accent on grid job management and distributed storage. Focus on improving job reliability, diagnosing system failures, and optimizing I/O performance in high-throughput environments.
Location: Westborough, MA, Austin, TX, or Santa Clara, CA. Must be eligible to access export-controlled information under U.S. export control laws (EAR).
Salary: $128,000 – $189,370 per annum
Company
provides essential semiconductor solutions for data infrastructure across enterprise, cloud, AI, and carrier architectures.
What you will do
- Own and evolve grid job management infrastructure for large regressions and high-volume batch workloads.
- Debug and resolve grid job failures, including scheduling issues, hung jobs, and resource starvation.
- Improve job reliability through the implementation of watchdogs, retries, heartbeats, and failure detection.
- Manage shared engineering storage systems, resolving issues related to I/O performance, file contention, and permissions.
- Design and deploy monitoring, logging, and metrics to proactively detect infrastructure problems.
- Act as a technical bridge between engineering users, tools teams, and central IT to translate requirements into improvements.
Requirements
- Bachelor’s degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent experience.
- 8+ years of experience in compute infrastructure, grid operations, or large-scale engineering environments.
- Strong experience with grid or batch schedulers such as LSF, UGE, Slurm, or PBS.
- Deep Linux systems knowledge, including process management and resource monitoring.
- Experience with shared storage systems including NFS and enterprise filers.
- Strong scripting skills in Python, shell, or similar languages.
Nice to have
- Experience supporting EDA or engineering compute workloads.
- Familiarity with job controller or wrapper-based execution architectures.
- Experience operating environments with thousands of concurrent batch jobs.
- Exposure to cloud or hybrid compute environments.
Culture & Benefits
- Comprehensive benefits covering financial well-being, family support, and mental/physical health.
- Employee stock purchase plan with a 2-year look back.
- Robust mental health resources and family support programs.
- Recognition and service awards to celebrate milestones and contributions.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →