TL;DR

Cloud/Site Reliability Engineer: Evolving cloud-hosted environments to be more self-aware, self-healing, and scalable, ensuring high availability and performance of applications and services. Focus on proactive monitoring, root cause analysis, and automation-driven remediation.

Location: Providence, RI, US

Salary: $117,880 - $240,000

Company

Brightstar is an innovative, forward-thinking global leader in lottery that builds on our renowned expertise in delivering secure technology and producing reliable, comprehensive solutions for our customers.

What you will do

Design and refine monitoring strategies using tools like Dynatrace, Prometheus, and ELK.
Develop and implement self-healing capabilities that proactively detect and remediate issues, minimizing manual intervention and downtime.
Analyze operational workflows to identify repetitive tasks and transform them into scalable, automated solutions.
Manage Cloud infrastructure and services.
Participate in 24x7 On-Call rotation with after-hours support for critical incident response.

Requirements

Hands-on experience in cloud operation or site reliability engineering field.
Practical experience in public cloud infrastructure and services management (Azure / AWS public cloud knowledge would be preferred).
Proficiency in scripting and automation (Terraform, PowerShell, Python, Bash).
Experience with Infrastructure as Code (IaC) and GitOps principles.
Hands-on experience on K8s and containers orchestration.
Expertise in monitoring tools (Dynatrace, Datadog, Prometheus, ELK).

Nice to have

Apply Agentic AI techniques to drive intelligent automation, optimize cloud services, accelerate troubleshooting and root-cause analysis, and enhance system resilience and recoverability.
Familiarity with AI/ML Ops or AI-assisted observability tools.
Thorough understanding of Java application workloads, and Java performance related topics.

Culture & Benefits

Be part of a forward-thinking Cloud Infrastructure Engineering, Operations & Automation team that values prevention over reaction, automation over repetition, and collaboration over silos.
401(k) Savings Plan with Company contributions.
Health, dental, and vision insurance.
Paid time off, wellness programs, and identity theft insurance.