Kubernetes Rightsizing and Performance Testing


Table of Contents


Transcript
Ole Lensmar: Hello, everyone. Welcome to this episode of the Cloud Native Testing Podcast. I am super delighted to be joined by Yasmine today from CloudBolt. Did I get that right? Awesome. Great to have you. We talked a little bit before this recording and you told me about your background, which is super interesting from a testing aspect. Before we dig into that, please share a little bit about what you're doing at CloudBolt and your background there.
Yasmine: I'm the COO here at CloudBolt. CloudBolt acquired Stormforge, which I had been at for three years, in April. We've been here five years now. I'm currently looking across the portfolio. Stormforge was and is a Kubernetes rightsizing solution, and CloudBolt has a portfolio of products around hybrid cloud management. Think zero-to-one provisioning orchestration and everything around that, and then one-to-two, all your patch management and ongoing automation needs. We also have a FinOps product that provides an abstraction across all your cloud data, pulls in the costs, and then allows you to take action on it to reduce and rightsize. It's been a fun last five years. It's also been full circle for me. Before I was at Stormforge, I was at Puppet, which was infrastructure automation. I moved into the Kubernetes world, and now I'm in that hybrid world of everything. It's funny to hear some technologies that I used to work with continue to come up.
Ole Lensmar: You told me about the background of Stormforge, which originally was around performance testing, or there was at least some aspect of that. I'm intrigued by that and the segue to cost management. I'm also interested in talking about the overlap between performance testing and cost analysis and optimization, because those two things seem to be related.
Yasmine: Our first product we launched around 2019 was still for Kubernetes configuration to rightsize it, but it was performance testing-based. We would use performance testing tools, whether ours or others, to generate a bunch of data. While the data was being generated, the machine learning would run experiments to see the application's profile from a resources standpoint. For a Java app, that could be garbage collection, heap—every knob you can turn. The machine learning would do experiments and then come up with your best configuration. You could choose to be more performance-based, improve your costs, or something in the middle. The machine learning would give you 20-30 configurations and say this is the most optimal based on what you asked for. We needed performance testing data for the experiments. We acquired a performance testing tool called Stormforger, hence the name change to Stormforge in 2020, to help. At that time, in 2020, there wasn't a lot of performance testing being done by Kubernetes teams and everything was still siloed. You had a platform engineer still learning Kubernetes, performance testing teams that maybe hadn't started doing Kubernetes performance testing, and developers who still didn't understand how their app related to the Kubernetes configurations under the hood. Since then, things have changed. I run into pretty mature performance testing teams that know how to test their Kubernetes workloads. We still have that dichotomy between the app teams and the platform teams. The app teams don't really think about Kubernetes configurations; they just deploy the app and set whatever was there or an average. Then the platform engineer says this doesn't work, leading to either performance or cost problems. We're still in that fun space, but now we just look at data live, pulling it from the platform. It's less required for performance testing. But when you are doing performance testing, we can just pull the data from that time. We still work a lot with performance testing.
Ole Lensmar: It feels like maybe you were a little bit ahead of the curve a couple of years ago, and then teams and organizations caught up as they became more familiar with Kubernetes and application deployments and resource management. I'm sure Kubernetes in itself has evolved a lot in that vein as well. It's evolving quickly. Something we get asked about is cost optimization and resource optimization related to testing, not necessarily load testing, but also if you're running massive end-to-end tests or functional tests. People ask us, can you help us minimize our test execution costs or what can we do? Is that something you see or saw at Stormforge and what different flavors of that did you see?
Yasmine: I've definitely seen it thematically, but obviously we're biased from the conversations that we're in. The angle I've seen it is wanting to optimize the resources while that test is being run. I'm running a thousand jobs. Maybe those jobs are only 20 minutes, so they're not taking that much time, but during that time I had to provision a lot of resources to run those jobs. And I don't think I need that many resources, but I don't know what I need. Historically, we've taken the approach of, it's running for 20 minutes, do you really need to optimize it? But some of these larger organizations that are literally running thousands and thousands of jobs are like, no, I do need to optimize it. We get that level of conversation, but never something like, I'm running a three-hour long test and I want to optimize it to run it shorter. That's just not the space that we've been in.
Ole Lensmar: We definitely see people running long-running tests, even over days. They're generating load over a weekend, just to see how the system handles that. But maybe when you run those kinds of tests, cost isn't that important. You're investing into making sure that your system can handle that load, so it's maybe not as much of a concern there.
Yasmine: In that example you just brought up, what I've seen is at those moments, what you're really testing for is, will my production environment be resilient? Anytime I have a conversation with an organization where they say cost is really important and we're trying to get costs down, I ask where it ranks in their initiatives. Resiliency and deploying to prod faster always trump costs. So running the long-running tests over the weekend, you want to make sure once you deploy the apps to prod, it'll be robust enough to take any increase in load. People care about costs, but not at the risk of performance. That's something that we've had to put a lot of tuning into, just making sure. I like to say we're a rightsizing tool that will improve your reliability, and the byproduct is that you reduce your costs.
Ole Lensmar: Do you see that there is an increased concern about cost related to testing, or is it over the last years as people grow more into Kubernetes or is it more of a steady low priority?
Yasmine: It's been low priority. It's more the cost of the testing environments themselves that comes up a lot.
Ole Lensmar: Another thing I wanted to ask, since I'm guessing AI has some impact on what you're doing, both when it comes to cost optimization of AI workloads, but also maybe using AI to do cost optimization. You were really early, from my understanding, with algorithms that use machine learning. How has the recent generative AI wave influenced that or your approach to what you do?
Yasmine: It's funny because a lot of people ask if we're doing AI and can you explain how it works. We say what we're doing is machine learning; it's just advanced math. I don't want to make it sound so simple because it's advanced math I don't understand. But especially early on in the company's journey, we have had a bunch of PhDs working on the algorithm. We have three different algorithms we use depending on the profile of the data. At the end of the day, your requests and limits are just math. You want to make the requests and limits work, and it has to work with the HPA. The HPA generally you'll set a target utilization. How those two harmonize is where the hard math comes in. Once one of our PhDs drew all the formulas on the board and I was like, I definitely didn't take any of those math classes. But something we try and explain to people is that what we're doing is advanced math. You can't hallucinate because it's not generative. It doesn't need a bunch of inputs either. We collect about 20 or so metrics and we can provide recommendations really quickly, but the more data we have, obviously the better the recommendation is. That allows us to give every Kubernetes workload its own unique model. We can scale that without it being cost-heavy for us in the backend, because what we're doing is machine learning-based math. We're not using any large language models. Now, saying all that, what's changed is everybody is used to interacting with software or anything now with a chatbot. I can tell you, I ask questions all the time. So some stuff that we are working on at CloudBolt to interface to all of our products is a chatbot that will look and say, okay, tell me which one of my workloads are set to auto-deploy or what would have been my most over-provisioned workloads in the last two weeks. Technically, you can go into the UI and answer that question, or you can ask it from the CLI. But people like a chatbot. So that's the type of stuff that we're adding.
Ole Lensmar: That's super interesting. We're seeing the same on our end where engineers are using their IDEs, which have integrated LLMs like Copilot and Cursor, and they're interacting with their ecosystem of tools using MCP servers. To your point, it's a new way of interacting, instead of a CLI, you have more of a natural language way. The experiments that we've done and we just launched our own MCP server is that it provides some pretty, sometimes astonishing capabilities. But to your point, often you could have done the same thing just by going to the dashboard, but the fact that you're doing it in one place and maybe that the LLM can then integrate that with other tools, right? It's not an isolated island of functionality. You can correlate that to other things. It's super interesting. I'm guessing it is similar for you if someone can do those kinds of analytics and then maybe correlate that to some other thing they're doing and then use the LLM maybe to figure out, I'm seeing these changes in cost, what changes in our code could be related to that. I don't know. To me when I hear those things I'm like, yeah right. But then I've tried it a lot myself and then it's like, my god, it's actually working much better than I thought. It's not perfect, of course, but it does. So I do get the fascination.
Yasmine: It's definitely, I don't trust it always. I always fact check everything. But I think we're going to get to the point where that convenience factor is just what people expect. We're not there yet. People are still toying with it. It's a nice to have, but I do think we'll get to a point where the convenience factor of having everything in one place, being able to just ask across whatever tools and then you have the answers is like Google, right? Before Google, people would go look for things, but once Google came out, no one's going to go research something separately, you just get it all at once. Or shipping, right? I used to go to the store. I rarely go to the store now. Once it hits that convenience factor, I think it's just going to be table stakes for any software. It'll be an interesting world.
Ole Lensmar: I'm looking at it from a software development perspective, but I'm guessing also from an ops perspective that people might move to an operation.
Yasmine: I use it to feed it spreadsheets and an outline. Be like, here's what I'm looking to put, give me an outline for a deck or create the outline of slides for me. Busy work that I hate doing. It's great for that.
Ole Lensmar: Talk a little bit more about testing. Do you know how the teams at CloudBolt are testing the tools that you're building? You were at Stormforge for a while. Do you have insights into that? Is that anything you can share?
Yasmine: From a tooling standpoint, honestly off the top of my head, I don't know. At least on the Stormforge side, folks have built a completely automated test suite. What's hard for us is the machine learning. Think every environment is different. It's a snowflake. Some environment might be using the HPA on CPU. Some environment might be using the HPA on memory or CPU plus memory or a custom metric or KEDA. That's maybe 15 different combinations you could put together. My math is probably wrong, I'm not the math person around here, but that's just one use case of our product that we need to make sure we're testing for. Every time we make a change to the algorithm, that's a lot of trust that the customer has in us that we're making these changes in production live, that we have to run a ton of tests for. As a result, we've had to augment the tools that we have and build a lot of it DIY internally to make sure we're covering those use cases. The interesting part for us is the testing environments, because EKS is different than OpenShift, right? And that comes with its own fun hurdles. That's an opportunity for us of how we consolidate and make sense of all the different testing environments we need. We run our own stuff internally, which has helped us save costs, but also we catch things before customers do. We're trying to take that approach to the rest of the portfolio now that we're in the CloudBolt world of how can we build some of that automated testing within the rest of the portfolio.
Ole Lensmar: The number of permutations must be huge. I'm also thinking, do you test for different Kubernetes versions? Because I'm guessing that older versions, the underlying mechanics have improved, but also that they've introduced new features that relate to resource allocation, etc. So even that must be a dimension. I'm sure you have customers that are still on really old versions of Kubernetes, or at least not everyone is as quick to adopt new versions of technology as you'd like them to be.
Yasmine: For sure. Some of our customers are very large banks and as you can imagine, they don't upgrade that quickly. So we have a wide support matrix. And then Kubernetes comes out with new features like recently the in-place pod resizing, which everyone's been super excited about. We're also excited about it, but if you actually look at the feature, there's a ton of caveats. So when you're testing that, you got to make sure, okay, in this environment, we know it's going to work. In this environment, we know it's not going to work. It's in the docs that it's not going to work. So having the automation detect those things versus a human having to go run a test and then make sure it works has been a fun challenge.
Ole Lensmar: I can imagine. One thing that we've talked about with others in previous episodes is this topic of testing in production, which is usually when you go to a QA conference, it's a little bit of an anti-pattern for many people, but ultimately, almost all the people we've talked to here do that in one way or another. I'm curious, in your space specifically, you said you're using the product internally to optimize your own workloads. I'm guessing that is a way of testing in production or do you do that in other ways as well?
Yasmine: We do that in some other ways too. From a UI standpoint, we have a bunch of feature flags in the way that we deploy some of the products. Obviously that gets automated tests throughout the pipeline, but then we can see it with live real data with either our own data or in a customer environment that we've talked to them about, hey, we're going to turn this on, let's get feedback, let's test it there. We have a little bit of that that we'll do from a feature flag standpoint, and then have the ability to turn it off as needed. Our internal systems are our production because our customers all run on that. So that is essentially testing in production because we get it before the customers get it. So yes-ish, but we don't do any large type of load testing in production, anything like that. That's still non-prod for us.
Ole Lensmar: Just a reflection on the concept of cost optimization and resource. It seems like that's something everyone should be doing. But they're not. Maybe it's like testing. Everybody should be testing more, but they're not. Is there some inherent challenge with that that makes it hard for people to adopt? Or is it just that they're not aware? Or to your point, it's a low priority thing? Are there other things that are more important or what do you think is hindering more broad adoption of those kinds of solutions?
Yasmine: It's probably two things. One is a people challenge and the other is just that scale challenge. The people challenge is the folks that are responsible for changing the resources is the platform team, the Kubernetes engineer, DevOps, that group. But the people that set them and are impacted by them are the developers. Now you have non-shared incentives because the platform team's like, hey, we got to right size. Either you're undersized or you're oversized. We got to make a change to the workload. The development team's like, how do I know you're not going to break my workload? That's obviously a way oversimplification, but at the root of it, that's what always happens in any environment. Sometimes we'll even talk to platform engineers who are like, it doesn't matter. We're just going to turn this on and it's going to be opt-out. And then the developers have to take the changes. I'm like, you can take that approach, but almost always what happens is they deploy it out and something happens that maybe doesn't even have to do with the product itself, but it then opens a conversation of, well, I didn't have this before. So if you don't have that developer buy-in, it's really hard. Unless you're a small org and you can just flip the switch and run it everywhere. But especially in the larger organizations, it comes down to people. And sometimes people don't like talking to each other. Sadly, it often comes down to just that. So we try and broker that one through just encouragement of talking to the different groups. I feel like I'm back in my puppet days of DevOps, like, hey, just talk to each other. But then the other is driving that through software, making sure developers can interact with the software via annotations, like they're used to for the rest of their deployment. They can just do their Kubernetes configurations for rightsizing also via annotations and build that confidence, give them control over machine learning. Humans like to make sure they can constrain things that are going to happen out of their control. So giving them every single configuration, the ability to control it. I'd say that's the people side of the challenge. And then the scale part is at the end of the day, you're just changing three settings. You're changing requests, limits, and potentially target utilization on the HPA. I guess for CPU and memory, so now we're up to what, like five. Five times how many workloads do you have? 20,000, 50,000. We have customers with 200,000. That is a hard math problem. What makes sense today, today's Friday, right? What makes sense today doesn't necessarily make sense tomorrow from a resource profile. If you just set an average, then that means you're going to be wrong part of the time because you set an average. So figuring out the right setting for the right time and then automatically deploying it just becomes a scale challenge that some people try. We often get pulled into the conversation when an organization's like, yeah, we know we need to cut down our costs. We know we're super over-provisioned in our Kubernetes environment. We wrote some scripts, tried to take some averages, and it was just too painful. And it didn't work because then it broke something because the HPA then came in and scaled and it didn't work. So it's always a good time to have that conversation because they feel the pain versus otherwise it's just a daunting task of, well, can I even do this? Is this possible?
Ole Lensmar: One thought, I'm obviously approaching this from a testing angle once again, seems like testing could be an important piece here. When you introduce a tool like this as a way to ensure that things continue to work, right? So if your application team has thorough end-to-end tests or functional tests, and then you turn on CloudBolt and you run those test suites, you'll both see, hopefully, that things don't break, but you'll also see maybe performance profiles of those tests going either up or down. Do you see testing being used to validate that a tool like this does what it's supposed to do and doesn't break things?
Yasmine: Yeah, we always encourage people to run this in a real environment, right? Don't run in production on day one, unless you really want to. We've had some people like, no, we want to run. I'm like, I don't know. Okay. But we encourage them to run in a real environment, throw performance testing data at it. Don't put it in a dev environment that has no traffic and nothing's happening. Then what results are you going to get? The machine's learning is going to be like, you're not using this environment, shut it down. So we definitely encourage that. I think testing is a key part here and it's a lot nicer now than it was in 2020 where it's like, wait, testing what? How do I do this in Kubernetes? I have yet to run across an organization that doesn't have at least some way to test their Kubernetes environment. So that's been a big help.
Ole Lensmar: Like automated, both, like you said, performance tests, but also just functional tests, having end-to-end integrations, test suites that can be run. I guess that could also be ways to experiment with different configurations of CloudBolt to see how they impact the performance of our target system or the functionality. Okay, great. That's super interesting. Any final words from your end on cost optimization or testing or something in between?
Yasmine: Boy, final words. That's a lot of pressure. Honestly, I think just the maturity of the ecosystem now has been really nice to see, where I feel like I say this every time I'm at KubeCon talking to people. Every year, the conversations are just dramatically more mature than the year before. Where before people didn't have performance testing tools. Then it was, wait, why would I even right size? My environment is not that big enough. Why is this a problem? Then it became, yeah, I've tried this and I couldn't do it, so then I just kind of let it be as it is. And then now I think this year we'll see. I can't predict the future, but the market conditions are interesting. The economy is interesting right now. More and more we're seeing people tightening their belts. And so it'll be interesting to see those initiatives rise to the top of importance and really be the driver to help people go in and automate their rightsizing.
Ole Lensmar: Just actually a question occurred to me here. Is there like an inflection point when it's time to start using a tool like CloudBolt or do you just, often people start testing late or too late at least. And you'd always say we should be writing some basic tests from the beginning. Is it the same thing here that you should be, there is an advantage of putting a tool like this into place really, really early, even if you're to your point, really not using the capacity that you have, but as you grow, that will help guide you going forward as your application scales. Do you have a recommendation there?
Yasmine: We typically have a sizing, like a thousand VCPUs. Once you're at that point, then it's like, okay, it makes sense. You want to be rightsizing. But the reality is even in a small environment, you want to make sure your requests are set correctly, not for cost, but for reliability. We've talked to a lot of very small, smaller orgs, products that are just about to launch that haven't even deployed to prod yet. And they don't have the requests set at all. If you don't have your requests set, you don't have any resources promised to you. So rightsizing is exciting from a cost standpoint, but at the end of the day, it's more about making sure your requests are just set correctly, whether that's not making it too high or not making it too low. So starting very early is super important for anyone.
Ole Lensmar: Just like with testing. Perfect. Thank you so much, Yasmine. It's been a pleasure having you on this episode. Thank you to all listeners listening and looking forward to a future episode. Bye bye. Thank you, Yasmine.
Yasmine: Thanks for having me.