Kubernetes on TV: Watch How Your Services are Operating in Production
This article introduces a way to watch Kubernetes services in production thanks to a comprehensive visualization that provides a quick yet effective insight on the heath of your applications. This is a multi-cluster federation and consolidation approach that leverages native containers’ and pods’ health checks from Kubernetes. On top of that, the proposed visualization approach provides an advanced aggregation and simple dashboards allowing to quickly assess how services are operating in each individual namespace.
Disclaimer: I’m the author of RealOpInsight.
What You’ll Learn
This work intends to describe a visualization for Kubernetes that provides:
- Tactical dashboards with a perspective of namespace activity monitoring.
- Easy/quick analysis of root causes in case of problems.
- Comprehensive visualization for proactive monitoring that organizations typically need for T1/T2 helpdesk and NOC monitoring (NOC — Network Operations Centers).
- Federated multi-cluster and multi-tenant visualization.
- Service level indicators to enable analytics of failures trends over time.
- Email notification when namespaces consolidated activity status changes.
Building Consolidated Namespace Status
As illustrated on the Figure below, the foundation of the approach is that:
- All namespaces are automatically discovered and, for each of them, its components (containers, pods and services) are also automatically discovered to generate a namespace microservice tree that binds the relationship among those components. Basically the discovery of components and the relationship binding within the microservice tree fully rely on pods’ labels and services’ selectors in Kubernetes.
- The logic behind each namespace’s microservice tree is that: at the bottom we have containers bound to their pods, which pods are in turn bound to their services, that are finally bound to the namespace that is expected to represent a virtual application space.
- Within the microservice tree, the status of components are propagated using a bottom up approach. That is designed to always highlight and propagate weird behaviors or situations that can suggest potential failures. For illustration, imagine a service based on replicated pods. If we consider a situation where there is a failed pod, the propagation shall show a problem to highlight the fact that you should consider a potential failure on the underlying pods — even if there are still running pods matching the service’s selectors (see illustration below). Another interesting point is that we’ll be always warned when there is a service whose selectors match no pod.
It’s well admitted, practice is better than speech. In this part we’re going to demonstrate using RealOpInsight how the approach presented above can be implemented on your Kubernetes clusters. In a few steps we’ll see how you can use it in your Kubernetes monitoring environments in a few minutes.
Assuming that you’re running Docker on your local machine, the following command shall start an instance of RealOpInsight. See the installation guide for a deployment on Kubernetes.
$ docker run -d \
--name realopinsight \
--network host \
--publish 4583:4583 \
Accessing RealOpInsight UI
Once the container started (check with
docker ps), you shall be able to access RealOpInsight UI at
The default credentials to log in are:
The administrator home page looks as on the following screenshot.
Integration with a Kubernetes Cluster
RealOpInsight requires a read-only access to Kubernetes API, and the integration involves the following steps.
- Sign in as administrator (default credentials:
- Select the menu
Monitoring Sourcesto open the source configuration page.
- Set the
API Endpoint URLto
https://kubernetes.default/(in-cluster API URL) and , optionally if the cluster uses a self-signed certificate, check
Don't verify SSL certificate.
- Leave the field
Auth String Tokenempty, meaning that RealOpInsight does authenticate against the Kubernetes API using its RBAC service account. That service account (named
realopinsight) along with its RBAC permissions are created during the deployment with Helm.
Add as sourceand select an ID for the source when prompted.
- Click on
Applyto finish the operation.
Verify the Kubernetes Source
Select the menu
Manage Operations Views to check that all the namespaces within Kubernetes have been successfully discovered and imported as on the below screenshot (list at the right side).
Additionally, by using the menu
Preview you can see how each namespace’s microservice tree shall look like. But that’s not what we want at the end, so let’s move forward.
Preparing for the Final Visualization
At this step we’re almost ready to visualize our services as expected, but we need to prepare our environment for that:
- Select the menu
New Userand fill in the form to create a new user. Set the required fields and take care to set the user profile as Operator; the password should to be an alpha numeric string with at least 6 characters. For this tutorial we assume that the user created is named
- Then select the menu
Manage Operations Viewsand move to next step.
- In the user list at the left side, select the username created previously (
kopsfor this tutorial).
- In the namespace list at the right side, select items the user should visualize. You can hold the
Ctrlkey to select multiple items. Remark that, when you have several users you can assign to each of them a specific set of items for visualization. This capability is typically useful for multi-tenant monitoring environments.
- Click on the button
Assignto validate your choice.
- We’re done and can move forward for the visualization.
Go To Visualization
Log into RealOpInsight as the user you created previously (
kops for this tutorial). Upon the login the user’s default dashboard will be loaded and we shall watch a comprehensive view that looks like on the below screenshot. In this dashboard we have:
- A Tactical Overview section at the left side: it provides for each namespace a tile describing the overall status propagated by the underlying microservice tree. By clicking on a tile you will open the microservice tree console providing details on containers, pods and services. This console is further introduced in the next section.
- A Reports section also at the left side: it provides for each namespace a history of pods’ status over a selected period of time (30 last days by default).
- An Open Events section at the right side: it provides a feed of last failures on pods — regardless of the affected namespace.
Explore Failure Impact and Root Causes
Each namespace’s microservice tree is backed by a console that simplifies the analysis of incident impact and the identification of problems’ root causes.
See the screenshot below for illustration.
Basically the console provides: a Tree View (left side) and a Map (top right side) that display the microservice tree with two exploration perspectives; a Message Panel (bottom right side) to display status messages related to containers and pods. There is also a pie chart (bottom left side) displaying the ratio of pods according to their status — running, failed, pending, etc.
In this story, we’ve shown a way to watch Kubernetes services in production environments operated by help-desk and Network Operations Centers (NOC) teams. We first introduced the basic concepts behind the proposed approach, then demonstrated step-by-step an implementation based on RealOpInsight.
While we’ve mentioned a multi-cluster approach but made the demonstration with only one cluster, it’s worth noting that integrating other Kubernetes sources in RealOpInsight is just simple as what has been described above with one source. If you have multiple Kubernetes clusters, just try to integrate them out and things would just work. You can even set up visualization for your users with namespace items coming from different Kubernetes clusters. Beside that, you can also use the RealOpInsight’s Editor to combine imported items to have a federated visualization item.
Note also that you can configure RealOpInsight to enable email notification when the overall status of namespaces' microservice tree changes from a normal to a non-normal state — and vice versa. See the menu
Notification from the administrator home.
Folks, that’s all for this story. Enjoy and don’t hesitate to share feedback!