Get hands-on! Observability with Prometheus and Grafana.
Observability in K8s: metrics, logs, and traces.
It's impossible to understand what is going wrong in an incident, and getting to its root causes, without having clear observability and visibility. In traditional development models, monitoring focused on infrastructure and relied on logs. In the cloud-native space, an entire ecosystem must be monitored and understood. It becomes essential to understand a combination of metrics, traces and logging stacks, focusing not just on the infrastructure but also on user experience, performance, applications and infrastructure.
Observability and visibility are the two primary ways for identifying and understanding what is going wrong during an incident and digging into it after the incident. When incidents occur in a cloud-native environment, the complexity of the infrastructure makes it difficult to get clear visibility into what happened and how the incident might be fixed. Observability and visibility go hand-in-hand, providing ways not only to inspect, understand, and fix incidents as they are happening but also to inform ongoing incident-prevention work, dive into systemic or root causes, and, most of all, build greater resilience.
Visibility mirrors what would traditionally be thought of as monitoring. Its primary function is to indicate that something is wrong and provide the basic metrics needed for troubleshooting. Monitoring has traditionally been the domain of ops engineers, but this has shifted to become a developer concern as well. With the complexity of containerized applications, developers are best positioned to understand what might be going wrong, and in parallel, visibility has expanded to introduce new tools and techniques for investigating and diagnosing issues, in many cases, specifically for developers.
For example, a service catalog provides a centralized "source of truth", listing services, their ownership and dependencies, resources, and other metadata, essentially delivering on the "single pane of glass" concept where a developer can gain instant visibility into the full picture. Another example is the need for distributed tracing. A distributed system spans multiple services, and to locate an issue, a single logical trace that can span these services is necessary.
Observability is the constant monitoring of system and of business KPIs with the goal of understanding why something is happening. It goes beyond the here-and-now of visibility (which itself is key to observability) and extends to the analysis and understanding of broader problems or issues, the underlying system and root causes. There is considerable overlap in visibility and observability. Observability just encompasses more and different things, including insight and potential actionability.
Distributed tracing can be a very useful tool to enable a developer to locate issues within a complicated graph of microservices. For “deep systems”, where a single user’s request is often handled by multiple layers of services before returning a result, it is essential to be able to observe the path the request took through the system.
The why of observability can inform both the post-incident rundown and postmortem, but of potentially much greater, lasting value, can drive the way applications are created from the outset. That is, cloud-native developers can practice "observability-driven development (ODD)": "defining instrumentation to determine what is happening in relation to a requirement before any code is written". “Just as you wouldn’t accept a pull-request without tests, you should never accept a pull-request unless you can answer the question, “how will I know when this isn’t working?””
Observability can become a part of the development process itself. because it's possible to flip the script on development and use production to drive better code. How can this deliver benefits for developers? The insight gathered from shipping and running applications will strengthen future development by:
Follow the step-by-step instructions located to configure Prometheus and Grafana on DigitalOcean managed Kubernetes.
Time for your next lesson!