Please beware of recruitment scams that are currently targeting jobseekers. Click here for further advice.
Back to jobs
Featured
Site Reliability Engineer (SRE) - FS
Job description
Position Overview
We are seeking a skilled and motivated Site Reliability Engineer (SRE) with 3+ years of experience. The role focuses on ensuring the availability, performance, and scalability of platforms and services using modern observability and orchestration tools.
Key Responsibilities
- Design and maintain scalable, resilient, and secure infrastructure.
- Monitor system health using ELK, Grafana, Prometheus, and ITRS Geneos.
- Manage containerized applications in Kubernetes environments.
- Develop automated deployment pipelines and configuration management tools.
- Collaborate with development and product teams to embed reliability into service design.
- Support real-time data streaming platforms using Kafka.
- Respond to incidents, conduct root cause analysis, and implement preventive measures.
- Contribute to SRE best practices, documentation, and runbooks.
Required Skills & Experience
- 3+ years in SRE, DevOps, or infrastructure-focused roles.
- Hands-on experience with observability tools (ELK, Grafana, Prometheus, ITRS Geneos).
- Proficiency in Kubernetes and container orchestration.
- Experience with Kafka for data streaming.
- Strong scripting skills (Python, Bash) and automation experience.
- Solid understanding of Linux, networking, and cloud infrastructure.
- Strong troubleshooting and collaboration skills.
Preferred Qualifications
- Experience in financial services or regulated industries.
- Familiarity with CI/CD pipelines and Infrastructure as Code (Terraform, Ansible).
- Certifications in Kubernetes, AWS, or related technologies.
