Principal Software Engineer, Site Reliability (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Principal Software Engineer, Site Reliability (AI/SRE): Building and scaling AI-powered SRE platforms and systems to ensure compliance and SLA promises with an accent on automated remediation, incident detection, and platform adoption. Focus on architecting large-scale distributed systems, designing internal platforms for other engineering teams, and integrating AI capabilities into reliability workflows.
Location: Bucharest, Romania (Hybrid/Remote flexibility varies by team)
Company
A category-leading enterprise software company specializing in the transformative power of automation to change how the world works.
What you will do
- Design and build AI-powered SRE platform systems as products that other engineering teams depend on in their critical path.
- Manage livesite monitoring rotations, handle escalations, and drive effective mitigations via detailed post-mortems.
- Drive availability, scalability, and performance improvements based on production learnings and codified best practices.
- Onboard other teams onto SRE platforms by writing integrations and removing friction personally.
- Mentor software engineers through hands-on coaching, advice, and training opportunities.
- Influence engineering-wide process improvements and reliability architectural changes.
Requirements
- 10+ years of experience architecting and engineering large-scale, distributed commercial applications.
- Proven track record of building complex internal platforms adopted by multiple teams at a large company.
- Experience building and maintaining complex AI-powered applications in production.
- Proficiency in one or more OO languages (C#, C++, Java, or Python) with solid computer science fundamentals.
- Deep understanding of microservices, HTTP applications, multithreading, and asynchronous patterns.
- Must be eligible to work in Romania (Bucharest).
Nice to have
- Experience working with or managing production Kubernetes infrastructure.
- Experience with cloud providers (Azure, AWS, GCP) and managed services (AKS, GKE).
- Experience with database backends such as Azure SQL, CosmosDB, MongoDB, or MySQL.
Culture & Benefits
- Flexible work arrangements including hybrid and remote options depending on business needs.
- Inclusive workplace that values diverse backgrounds and equal opportunities for all.
- Product-driven engineering culture where internal tools are treated as products with a focus on user feedback.
- Collaborative environment emphasizing curiosity, generosity, and a fast-moving growth mindset.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →