Senior/Principal Infrastructure Engineer (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior/Principal Infrastructure Engineer (AI): Deploying and supporting Karman systems within high-density data center environments with an accent on Linux administration, networking, and power infrastructure. Focus on building Tier 3 support policies, optimizing rack systems, and ensuring high availability of AI infrastructure.
Location: Based onsite at company headquarters in Ann Arbor, Michigan, with flexibility for occasional remote work.
Salary: $160,000 – $195,000 base compensation
Company
is an NVIDIA-backed edge AI company enabling greater visibility and control of power utilization in energy-intensive infrastructure like the electric grid and data centers.
What you will do
- Deploy and configure Karman systems in high-density data centers ensuring adherence to organizational standards.
- Manage and maintain B300 or equivalent rack systems, including PDU and PSU optimization.
- Perform advanced Linux system administration, including patch management and performance tuning.
- Design and implement network configurations including routing, switching, and connectivity optimization.
- Plan capacity requirements and identify scaling bottlenecks to meet growing infrastructure demands.
- Participate in an on-call rotation to ensure 24/7 system availability and rapid incident response.
Requirements
- 8+ years of experience with Linux system administration (RHEL, Ubuntu, CentOS, etc.).
- Proven experience deploying and managing applications in high-density data center environments.
- Strong knowledge of rack systems, PDUs, PSUs, cooling systems, and power management.
- Hands-on experience with TCP/IP, DNS, DHCP, VLANs, and routing protocols.
- Ability to use first-principles debugging in environments with evolving processes.
- Willingness to travel up to 20% of the time, including international travel.
Nice to have
- Proficiency with Ansible, Puppet, or Chef.
- Experience with Tailscale or Wireguard.
- Knowledge of Prometheus, Grafana, or ELK stack.
- Familiarity with Docker and Kubernetes.
Culture & Benefits
- Flexible work environment with flexible paid time off.
- Competitive health, dental, and vision insurance.
- Employer-match 401k.
- Mentorship and growth opportunities within a collaborative team.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →