TL;DR
Staff Production Engineer: Designs and builds foundational platforms and frameworks to underpin operational excellence across hirify.global with an accent on improving availability, resiliency, and delivery velocity at scale. Focus on translating reliability and operational requirements into automation, self-service capabilities, and opinionated paved paths.
Location: Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA. While we prioritize a hybrid work environment, remote work may be considered for candidates located more than 30 miles from an office, based on role requirements for specialized skill sets. New hires will be invited to attend onboarding at one of our hubs within their first month. Teams also gather quarterly to support collaboration
Salary: $188,000 to $275,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location.
Company
hirify.global delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence.
What you will do
- Design, build, and own foundational platforms and frameworks from architecture through adoption and operation.
- Lead technical strategy and execution for internal tooling that reduces manual operations, improves delivery velocity, and supports hirify.global’s revenue growth through faster, more reliable datacenter delivery.
- Partner with service owners and platform teams to translate reliability and operational requirements into automation, self-service capabilities, and opinionated paved paths.
- Build and evolve systems for observability, alerting, automated remediation, resiliency testing, and authoritative sources of truth, operationalizing best practices through tooling rather than manual enforcement.
- Participate in incident response for critical outages with the explicit goal of improving systems, tooling, and defaults to reduce future operational load.
- Ship production code, participate in on-call rotations as needed, and mentor engineers on platform ownership, operational design, and sustainable production practices.
Requirements
- 10+ years of experience building and operating distributed systems or cloud platforms at scale.
- Demonstrated ability to diagnose and resolve complex production failures across services, infrastructure, and automation layers.
- Strong programming experience (Python, Go, or similar) with a history of shipping and operating production systems.
- Deep expertise in cloud-native platforms and distributed systems, especially Kubernetes.
- Advanced experience with observability and incident practices, including metrics, tracing, structured logs, SLIs/SLOs, and PIRs.
- Proven ability to lead large technical efforts and influence outcomes across teams without direct authority.
Nice to have
- Ownership of foundational internal platforms or frameworks used broadly across an organization
- Experience with service tiering, disaster recovery or business continuity planning, chaos engineering, or structured resilience programs
- Background operating large-scale AI/cloud infrastructure
- Experience guiding organizations through rapid scale while maintaining operational quality and discipline
Culture & Benefits
- Medical, dental, and vision insurance - 100% paid for by hirify.global
- Flexible PTO
- Catered lunch each day in our office and data center locations
- A casual work environment
- A work culture focused on innovative disruption
Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →