Site Reliability Engineer (SRE) - FS

Job description

Position Overview
We are seeking a skilled and motivated Site Reliability Engineer (SRE) with 3+ years of experience. The role focuses on ensuring the availability, performance, and scalability of platforms and services using modern observability and orchestration tools.

Key Responsibilities

Design and maintain scalable, resilient, and secure infrastructure.
Monitor system health using ELK, Grafana, Prometheus, and ITRS Geneos.
Manage containerized applications in Kubernetes environments.
Develop automated deployment pipelines and configuration management tools.
Collaborate with development and product teams to embed reliability into service design.
Support real-time data streaming platforms using Kafka.
Respond to incidents, conduct root cause analysis, and implement preventive measures.
Contribute to SRE best practices, documentation, and runbooks.

Required Skills & Experience

3+ years in SRE, DevOps, or infrastructure-focused roles.
Hands-on experience with observability tools (ELK, Grafana, Prometheus, ITRS Geneos).
Proficiency in Kubernetes and container orchestration.
Experience with Kafka for data streaming.
Strong scripting skills (Python, Bash) and automation experience.
Solid understanding of Linux, networking, and cloud infrastructure.
Strong troubleshooting and collaboration skills.

Preferred Qualifications

Experience in financial services or regulated industries.
Familiarity with CI/CD pipelines and Infrastructure as Code (Terraform, Ansible).
Certifications in Kubernetes, AWS, or related technologies.

Please beware of recruitment scams that are currently targeting jobseekers. Click here for further advice.

Consultant

Jennie Jiang

Principal Consultant, AmbTech

Site Reliability Engineer (SRE) - FS

Job description