Test-Based Monitoring for Kubernetes Apps

October 12, 2023
:
21
:
22
Ale Thomas
Ale Thomas
Developer Advocate
Testkube
Share on X
Share on LinkedIn
Share on Reddit
Share on HackerNews
Copy URL

Table of Contents

Unlock Better Testing Workflows in Kubernetes — Try Testkube for Free

Subscribe to our monthly newsletter to stay up to date with all-things Testkube.

Please disable pixel blocker extension
You have successfully subscribed to the Testkube newsletter.
You have successfully subscribed to the Testkube newsletter.
Oops! Something went wrong while submitting the form.

Transcript

One of the essential components of modern monitoring strategies is the use of status pages - which require building an additional, intricate system on its own, or relying on third-party software that provides this type of reports. But what if you could leverage your existing Kubernetes infrastructure and tests to build them?

Join us for Testkube’s Office Hours as we talk you through our new feature which lets you use your existing tests to monitor the status of your applications by creating an internal or public Status Page for monitoring - and get ready to start providing your users with real-time feedback and incident management!

Who should attend?‍

DevOps Engineers, Software Developers, QA Engineers, SREs - or anyone in-between. This event is perfect for those who wish to dive deeper or are curious about the world of testing applications in Kubernetes.


Oh, we're live! I kind of froze because I was getting a notification over here. Hello everyone, and welcome to yet another Testkube office hours. I am very happy to open the office door again to you and talk about everything that's going on in the testing world, specifically with Testkube and Kubernetes applications. If you're new here, Testkube is a Kubernetes-native testing framework that helps you orchestrate, run, execute, and manage your tests and test results for all of your applications running in Kubernetes. Today we have a really interesting topic: test-based monitoring for Kubernetes apps. We're going to talk a little bit about what a normal approach to monitoring applications and server status looks like, and then contrast it with our suggested approach using your existing tests. Our agenda is pretty straightforward: we are going to discuss what status pages are for monitoring in the software world and how to build a status page with Testkube using your tests.

First of all, let's talk a little bit about what status pages are in the world of software. You're probably familiar with them as a developer, a DevOps engineer, or a tester. You have likely heard this term or interacted with status pages before. A status page is a web-based interface that provides real-time information about the status of your systems or an application system. Most popular software will have a status page set up to inform both the users and the team of the overall health of the system. Ideally, on a status page, you want to see if your systems are up and running, if there are any ongoing incidents or issues with the software, or if there is going to be a delay because certain systems are undergoing maintenance. Nowadays, status pages are usually built through third-party observability systems. Today, however, I want to show you how to do that with your existing tests so you don't have to build an intricate system from scratch. You can just use what you already have, which are your existing tests for your apps.

I'm going to show you how to do that with Testkube. As always, let me share the documentation. You can head over to our documentation, where we have everything you will need to get started and follow along as I walk you through it. Let me just share my screen. It is a bit messy here with this dual monitor situation. We are now going to navigate to cloud.testkube.io and sign in.

This is my environment for Testkube in Testkube Cloud. As you can see, I already have a bunch of tests running normally, checking on the status of my applications. What you ideally want to do is use what you already have to your advantage when creating a status page. With Testkube's new feature, you can configure your very own status page in this tab, and I'm going to show you how simple and easy it is. Let's head over here. I already have one for my app, but you can simply create a new status page by adding a name. I named mine "my app," but we can change the name to "Ali's status page." Then you can add a description. Remember that this page can be either private or public, so you want to be concise and share relevant information with your users if you make it public. Let's add "Ali's app" as the description and save it, which should update immediately.

Immediately after you create your new status page, you will get a URL or an endpoint where it is exposed. When you first click on it, it will just show a random number slug, but you can change that slug to create your own custom one. You can set it to public to expose it so that it is accessible by anyone outside of your organization, or you can switch it back to private immediately. Private status pages can only be seen by logged-in users within your organization if you just want to keep this communication inside your team. You can also modify the time scale for your status page to display in either hours or days.

Let's head over to creating our services. Under the services tab, I have already created three different systems that I am testing with load tests and API tests. For example, let's assume you have a payment service or a whole web app that you want to test after making changes. To create a new service, you just provide a service name. In my setup, I have a "portfolio status" service, which represents my entire web app, where I run a URL test to ensure it comes back with the expected response. I also have a "payment service" that I check with a load test to see how it handles heavy loads, and a "scheduling service" where I run an API test to make sure the calls are working correctly and the API is in good health. Let's create a new service and call it "calendar system." Once added, it is very easy to link a test to it. You just select the test that targets that part of your application—for instance, an end-to-end Cypress test or another curl test—and then hit save.

When you view the public status page, you will see a clean breakdown of all your systems. If you set it to an hourly display, it will show whether your system is operational, down, or experiencing a temporary outage based entirely on your test results. Everything here tracks your test history; if the test runs successfully, we assume the system is working. If a system goes down, you can track exactly which test failed, whether it was the curl test, the k6 load test, or an API check.

You can also report incidents and scheduled maintenance so your users know if there will be any functional delays. Under the incidents tab, you can create a new incident. For example, I previously added one when my API was undergoing maintenance. If I report that my portfolio is failing, I can create a new incident called "temporary outage" and assign a severity level, such as critical. You can save it as a draft or publish it immediately. The incident description box supports markdown, meaning you can add formatting, indentation, and even emojis. You can specify the exact start time and select whether the incident is still ongoing or has ended. Once you create the incident, it is published immediately and appears on the status page history for your users to see. You can also edit or archive these incidents at any time to change their severity or update the information.

Another cool feature coming up is the ability to link this to PagerDuty so your on-call engineers can access the results directly from your status pages. Users will also be able to subscribe to these status pages to get updates, and you can sign up now to be notified when that feature launches.

This setup ideally leverages your existing automated tests, so you don't have to run anything manually to generate reports. Changing the view from hours to days allows you to look at your historical data over a longer period to see how your application performs over time. If a test fails, you can easily dive in to see what went wrong. It is incredibly easy to customize, manage, and build using what you already have. If you connect your tests to your CI/CD pipeline, a test will trigger every time there is a code change. In this case, I have scheduled these smoke tests and load tests to run every five minutes using a standard cron schedule, which automates the entire monitoring process.

The major advantage of linking status pages to your test results is gaining real-time visibility. These pages provide live insights into the health of everything running in your cluster, your apps, and your connected APIs. If an outage occurs, you get that information fast, allowing you to minimize downtime. It is also fantastic for transparency, allowing you to share a public-facing view across your team or with your customers. This helps build trust with stakeholders, as they can verify your application's uptime and performance on their own at any time. Furthermore, you gain long-term access to historical data to verify if your system is performing up to your quality standards over time. We are also working on adding full customization so these pages can follow your own corporate branding.

That is pretty much it. I will share the documentation for status pages in the comments, and if you have any questions, feel free to leave them down below. We also have a great learn page that covers how to get started and explains why using your existing tests for monitoring is a great way to keep your operations in-house rather than relying on a third-party system. Feel free to join our Discord to ask questions if you need help getting set up or understanding how it works; we are happy to chat and help you get Testkube running in no time.

Make sure to follow us to stay on top of our upcoming office hours. We will be back in a couple of weeks to show you what else the team is up to. I think test-based monitoring is an amazing feature that will greatly improve how you monitor your systems and communicate health status to your users. This feature is already live; you just need to sign into cloud.testkube.io to start creating your own status pages. Thank you for joining, and I'll see you in a couple of weeks. Bye-bye!