TL;DR
Site Reliability Engineer (SRE): Helping build and maintain highly reliable, scalable systems that combines software engineering and operations expertise to ensure services meet ambitious reliability targets while enabling rapid development and deployment with an accent on automation, monitoring, and system reliability. Focus on implementing and maintaining dashboards, performance monitoring, infrastructure automation, and ensuring security compliance.
Location: Fully remote for candidates who reside outside the 50 mile radius of our San Ramon office. For candidates who reside within 50 miles of our San Ramon location, this role is Hybrid and would require 3 days a week (M, W, TH) in our San Ramon office.
Salary: $71,800 - $190,000 USD
Company
hirify.global is a leading provider of cloud contact center software, bringing the power of cloud innovation to customers worldwide.
What you will do
- Design and implement comprehensive dashboards for OS/platform and application-level monitoring.
- Establish and maintain SLIs, SLOs, and error budgets for the service.
- Maintain continuous integration and deployment pipelines.
- Develop and maintain infrastructure using tools like Terraform, Ansible, or similar.
- Monitor and optimize cloud resource usage and costs.
- Build and maintain common services like notification systems, caching layers, and message queues.
Requirements
- 3+ years managing large-scale production environments.
- Comfortable with 24/7 on-call responsibilities and incident response.
- Strong Linux/Unix system administration skills.
- Understanding of TCP/IP, DNS, load balancing, and network security.
- Experience with SQL and NoSQL databases in production environments.
- Proficiency in at least two of: Python, Shell, PHP, Java, or similar languages.
- Experience with one of AWS, GCP, or Azure infrastructure and services.
- Hands-on experience with Docker, Kubernetes, and container orchestration.
- Experience with Prometheus, Grafana, ELK stack, or similar tools.
- Proficiency with Terraform, CloudFormation, or similar tools.
- Expert-level Git usage and collaborative development practices.
- Experience defining and maintaining service level objectives.
- Understanding of error budget concepts and implementation.
- Track record of identifying and eliminating repetitive manual work.
Nice to have
- Bachelor's degree in Computer Science, Engineering, or equivalent experience.
- Experience with microservices architecture and distributed systems.
- Knowledge of security best practices and compliance frameworks.
- Experience with chaos engineering and reliability testing.
- Previous experience in an SRE or DevOps role at a technology company.
- Contributions to open-source projects or technical communities.
Culture & Benefits
- Health, dental, and vision coverage, beginning on the first day of employment.
- Access to an innovative mental health support platform that offers personalized care and resources.
- Generous employee stock purchase plan.
- Paid Time Off, Company paid holidays, paid volunteer hours and 12 weeks paid parental leave.
Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →