Hands-on Site Reliability Engineering: Build Capability to Design, Deploy, Monitor, and Sustain Enterprise Software Systems at Scale (English Edition)

Shamayel Mohammed Farooqui & Vishnu Vardhan Chikoti

Language: English

Publisher: BPB Publications

Published: Jul 6, 2021


A comprehensive guide with basic to advanced SRE practices and hands-on examples.


● Demonstrates how to execute site reliability engineering along with fundamental concepts.

● Illustrates real-world examples and successful techniques to put SRE into production.

● Introduces you to DevOps, advanced techniques of SRE, and popular tools in use.


Hands-on Site Reliability Engineering (SRE) brings you a tailor-made guide to learn and practice the essential activities for the smooth functioning of enterprise systems, right from designing to the deployment of enterprise software programs and extending to scalable use with complete efficiency and reliability.

The book explores the fundamentals around SRE and related terms, concepts, and techniques that are used by SRE teams and experts. It discusses the essential elements of an IT system, including microservices, application architectures, types of software deployment, and concepts like load balancing. It explains the best techniques in delivering timely software releases using containerization and CI/CD pipeline. This book covers how to track and monitor application performance using Grafana, Prometheus, and Kibana along with how to extend monitoring more effectively by building full-stack observability into the system.

The book also talks about chaos engineering, types of system failures, design for high-availability, DevSecOps and AIOps.


● Learn the best techniques and practices for building and running reliable software.

● Explore observability and popular methods for effective monitoring of applications.

● Workaround SLIs, SLOs, Error Budgets, and Error Budget Policies to manage failures.

● Learn to practice continuous software delivery using blue/green and canary deployments.

● Explore chaos engineering, SRE best practices, DevSecOps and AIOps.


This book caters to experienced IT professionals, application developers, software engineers, and all those who are looking to develop SRE capabilities at the individual or team level.


Shamayel M. Farooqui is a technology leader who specializes in driving digital transformation for organizations and is the author of 'Enterprise DevOps Framework - Transforming IT Operations'.

He has expertise in implementing IT security, cloud migrations, and IT automation and a proven track record of building teams of skilled site reliability engineers focused on delivering solutions for optimizing and running hybrid, multi-cloud environments. He thrives on building creative solutions to solve complex IT problems and has mastered the art of building reusable automation for driving efficiency in IT/business processes and cloud management. He is passionate about innovation and has developed innovation frameworks that can be adopted by enterprises to drive a culture of innovation in their teams.

Vishnu Vardhan Chikoti has diverse experience in the areas of Application and Database design and development, Micro-services & Micro-frontends, DevOps, Site Reliability Engineering, and Machine Learning.

With the ability to conduct deep analysis, strong execution skills, and an innovative mindset, he has successfully led R&D teams to build engineering solutions to improve the reliability of applications. He is also an expert in building high-volume transaction processing applications for middle and back-office functions for Investment Banks using a variety of architectures.