Назад
11 часов назад

Founding Infrastructure Engineer

50 000$
Формат работы
remote
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US/Canada
vacancy_detail.hirify_telegram_tooltipВакансия из Telegram канала -

Мэтч & Сопровод

Покажет вашу совместимость и напишет письмо

Описание вакансии

Founding Infrastructure Engineer

Company

Metagov

Conditions

21 hours agoSeniorSalary: 50K - 50KFully remote (preference for Europe, United States, or Canada) Remote Contract Devops Jobs by Metagov

Skills

Tracing Capacity Planning Fault Tolerance Compute Incident Response Performance Load Testing Gpu Sre Distributed Systems Open Source Monitoring Orchestration Observability Aws Mcp Vllm Analytics Model Routing Fallback Design Openwebui Cscs Infomaniak Agent Tooling Routing Transparency Endpoint Provenance Hpc Multi-Region Deployment

About the Role

You will be the technical owner of the platform's operational backbone. You will harden the platform for major launches, perform load testing, and build fallback routing and per-agent monitoring. You will implement end-to-end observability and integrated trace analysis across heterogeneous infrastructure, ship downtime warnings and fallback behavior, and implement routing transparency and endpoint provenance so users can verify which backend served their inference. You will improve performance of public endpoints, integrate programmatic infrastructure interfaces such as an MCP server, and make the utility more transparent and contributable. You will set priorities autonomously, operate production inference and ML serving infrastructure, and coordinate with cloud providers, HPC centers, and other infrastructure partners. Occasional travel for team workshops may be required.

Requirements

  • Significant experience operating production inference or ML serving infrastructure (vLLM, model routing, multi-region deployments, GPU-backed services)
  • Strong distributed systems and SRE instincts including observability, incident response, fallback design, and capacity planning
  • Comfort working across heterogeneous infrastructure partners including cloud providers and HPC centers
  • Experience orchestrating many stacks and integrating open-source projects
  • Maintainer and integrator experience with pride in operational excellence
  • Ability to work autonomously in a small team and travel occasionally for workshops

Responsibilities

  • Harden platform for launches
  • Perform load testing
  • Build fallback routing
  • Set up per-agent monitoring
  • Build end-to-end observability across stacks
  • Ship downtime warnings and fallback behavior
  • Implement routing transparency and endpoint provenance
  • Improve production service performance
  • Integrate MCP server or programmatic infrastructure interfaces
  • Make infrastructure transparent and contributable
  • Operate and maintain production inference and ML serving infrastructure
  • Coordinate with heterogeneous infrastructure partners
  • Orchestrate and integrate multiple open-source stacks

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник -