Goose, the open source load testing framework created by Tag1 CEO Jeremy Andrews, continues to show its performance and scalability capabilities. In this Tag1 Team Talk, Managing Director Michael Meyers joins Vp of Software Engineering Fabian Franz for a demonstration of Goose’s rapid ramp-up and scaling by COO Narayan Newton.
In this final talk in our series of live demonstrations of Goose, Narayan and Fabian break down how some of the methods used in part 2 weren’t ideal, and some ways to make spinning up load tests more efficient and fast.
For more Goose content, see Goose Podcasts, Blogs, Presentations, & more!
Hello, and welcome to Tag1 Team Talks the podcast and blog of Tag1 Consulting.
Speaker:Today.
Speaker:We're going to be talking about distributed load testing and doing
Speaker:a, how to a deep dive into running Gaggles with Tag1's, open source
Speaker:Goose load testing framework.
Speaker:Our goal today is to prove to you that Goose is both the most scalable load
Speaker:testing framework available currently.
Speaker:And the easiest to actually scale.
Speaker:This is a follow-up.
Speaker:Talk to one we did very recently on a similar topic.
Speaker:We're going to stress the servers even more in this one.
Speaker:I'm Michael Meyers, the managing director at Tag1 Consulting.
Speaker:And I'm joined today by Narayan Newton Tag1's CTO and Fabian
Speaker:Franz, our VP of Technology.
Speaker:Fabian, can you give us just a quick background on how this is
Speaker:a follow-up to our last talk?
Speaker:Sure so in , our last talk, what we did is so essentially we spun up EC2
Speaker:instances all over the world, but if you need to change something, you
Speaker:essentially have to destroy the cluster.
Speaker:Re- deploy the cluster and while recording it.
Speaker:We actually ran into the problem that we had to change something and it wasn't
Speaker:easy and not easily possible to do that.
Speaker:And we want to change the Ulimit because with Goose, if you Put a lot of users
Speaker:and you usually need to increase the ulimit that Linux comes with and then
Speaker:you need to do in the VM obviously.
Speaker:And we have no real control about that because we only had to start script.
Speaker:So why the solution that you've presented was very straightforward, very simple
Speaker:and easy to use, essentially, if you quickly want to iterate on something.
Speaker:Yeah.
Speaker:It can take quite a while because you have to wait for all the clusters to shut down
Speaker:and really don't want any EC2 machines like hanging around for 10 years and then.
Speaker:Thousands of dollars that you are paying for nothing because
Speaker:you were in a load test once.
Speaker:It's so important to cleanly shut down and then start up.
Speaker:But that costs a lot of time and development.
Speaker:Time is also costly.
Speaker:So today we are having a completely new solution and am I'm totally
Speaker:fascinated and excited by it.
Speaker:And Narayan, please tell us more.
Speaker:Yes, it was.
Speaker:if you watch the last talk I spent.
Speaker:unfortunate amount of it talking about why I disliked the thing I built to
Speaker:spin up EC2 instances from them because I couldn't control the end points.
Speaker:and then all the things that I was complaining about happened,
Speaker:we had to like stop recording.
Speaker:So I got annoyed and What we have today is similar to what we ran
Speaker:last time, where it is still like, kind of the same Terraform tree.
Speaker:And we're spinning up CoreOS nodes in various different regions, but instead of
Speaker:pushing just a Goose container to each of them to run the load test it is installing
Speaker:K3s which is a Kubernetes distribution.
Speaker:That is.
Speaker:Designed for IOT and CI and running at the edge.
Speaker:It's very small.
Speaker:it's not it actually, it is a full Kubernetes distribution, but it's not a
Speaker:full Kubernetes distribution in that is running everything on a standard one.
Speaker:they have made some changes to make it lighter and spin up faster.
Speaker:so you can now instead of running Terraform apply and
Speaker:it's spinning up the load test.
Speaker:It spins up a multi-region Kubernetes cluster which was interesting to do
Speaker:there were some oddities to doing that because each node, when you're spinning
Speaker:up a node, EC2, it has an internal IP address and external IP address.
Speaker:and if you're spinning up a Kubernetes distribution in a single
Speaker:region, that doesn't really matter because everyone's talking to each
Speaker:other via the internal IP address.
Speaker:But.
Speaker:When you're doing multi-region, everyone's talking to each other via
Speaker:the external IP address and that IP address does not appear on the VM at all.
Speaker:So that was interesting, I will share my screen.
Speaker:before we started, I, where we were last time is basically spinning up 10 nodes.
Speaker:two nodes in each region, five regions.
Speaker:I did that again, but with the new truss..
Speaker:Okay.
Speaker:So now we are on the manager node.
Speaker:Cube control gets set up automatically on the manager node.
Speaker:So.
Speaker:There is our cluster as it stands currently.
Speaker:So I just ran a get nodes wide so I can see extended fields.
Speaker:And you can see that we have the control plane, which is what we are on currently.
Speaker:And then all of our worker nodes, you can see the internal IP addresses
Speaker:and the external IP addresses.
Speaker:If you look at one of these, like, let's look at probably this one.
Speaker:You can see it's even tagged by the region its in.
Speaker:So we're fetching the region that spin up and tagging
Speaker:whatever region this node is in.
Speaker:Yeah, these are just boring regions, but there are more
Speaker:interesting regions as well.
Speaker:So to run the load test,
Speaker:I have a little YAML directory here, and this is what we're
Speaker:doing instead of pushing the Docker image to each CoreOS node.
Speaker:Okay.
Speaker:Should.
Speaker:And just letting it run without any control, is that now you spring up the
Speaker:cluster and you can submit these jobs.
Speaker:So this is the manager job it's going to ________ workers, but we want 10,
Speaker:but real quick.
Speaker:I got lost a little bit in the translation of things, so we have.
Speaker:EC2 instances and they are now having this huge Kubernetes network.
Speaker:And now what are we using to now deploy Goose or is Goose already there?
Speaker:Goose is not there.
Speaker:So this, when I'm copying up to the manager, node is our deployment of Goose.
Speaker:So this is a deployment telling Kubernetes.
Speaker:To spin up one replica of the Goose manager and so this is going to be the
Speaker:manager that all the workers connect to.
Speaker:Excellent.
Speaker:Perfect.
Speaker:So we are going to create that
Speaker:and you can see it creating here.
Speaker:And if we look back at that deployment, you'll see that we're, we have a node
Speaker:selector to tell Kubernetes that I want this to run on the manager node.
Speaker:I don't want it to run on the worker nodes because this is the
Speaker:management instance of Goose.
Speaker:Excellent.
Speaker:So how can I now?
Speaker:Um, can I now look.
Speaker:At the as soon as the Kubernetes, one is open, can I now look
Speaker:that , uh, is waiting for nodes?
Speaker:Can I like see the output of that as well?
Speaker:You can, once it's done creating the container.
Speaker:So what it's doing right now is it's pulling the container down
Speaker:from our container registry.
Speaker:How long does container deployment usually take with Kubernetes?
Speaker:It scales by the size of the container.
Speaker:And our container is actually quite large because I have not put the
Speaker:effort and making it smaller yet.
Speaker:Okay.
Speaker:So just downloading a few gigabytes of data just for the distribution and
Speaker:all the rest dependencies, et cetera.
Speaker:Exactly.
Speaker:Okay.
Speaker:And now this is our Goose worker same container, but different
Speaker:arguments to the container, obviously.
Speaker:we have some pod anti affinity rules here, which are kind of interesting.
Speaker:Basically.
Speaker:This is me saying to Kubernetes.
Speaker:I don't want you to schedule this on any node that already has a worker running,
Speaker:and I don't want you to schedule it on any node that has the manager running.
Speaker:So it will distribute it to every single node so that
Speaker:there won't be an inactive one.
Speaker:Yeah.
Speaker:You wanted to see this.
Speaker:So, there.
Speaker:Before I start the workers, the manager is not running and we can do a group
Speaker:control logs and there are logs.
Speaker:Nice.
Speaker:and it's waiting for 10 workers.
Speaker:So let's create the workers.
Speaker:Oh,
Speaker:I fixed this on the other one.
Speaker:So what it's complaining about is these pod affinity rules.
Speaker:It needs a topology key for each one.
Speaker:And so that one's responsible.
Speaker:So then this typology essentially meant that it's it's per host
Speaker:name, not per something else.
Speaker:Okay.
Speaker:Per
Speaker:host name instead of per zone, per region, per rack.
Speaker:So I could also decide to.
Speaker:Deploy, just one Kubernetes to each region, essentially.
Speaker:And now we have our workers spinning up.
Speaker:And so this is what was happening last time as well.
Speaker:It's just, this was happening when you ran Terraform apply.
Speaker:So it would bring up all the nodes and then each node would be doing this,
Speaker:but without our direct interaction,
Speaker:I think I like the manager way more.
Speaker:It's nice to have a little bit of control and to easily take a look at logs.
Speaker:Yeah.
Speaker:We have already started to send users.
Speaker:That's great.
Speaker:Yep.
Speaker:as part of this, the ulimit issues we were hitting cause in the old
Speaker:one, we weren't changing the ulimit.
Speaker:So we had a maximum open file descriptors 1,024.
Speaker:we are actually inheriting a ulimit fix that K3s pushes in when
Speaker:it installs, which is helpful.
Speaker:That's indeed helpful if they have already solved the problem for us,
Speaker:so that's starting up you can see.
Speaker:These are workers transitioning to the running state.
Speaker:we can look at the logs of the workers as well.
Speaker:Which is kind of cool.
Speaker:If you think that these workers are in like various regions, I
Speaker:can just run something to pull logs from like the central Europe
Speaker:Japan is cooler than central Europe.
Speaker:Yeah,
Speaker:that's true.
Speaker:Okay.
Speaker:And our load tests should be starting.
Speaker:I think it was around 15 seconds right I'm just coming in.
Speaker:Yeah, there we go.
Speaker:Oh my God.
Speaker:how fast do we have to ramp up right now?
Speaker:I had a hundred and you start going up and it, it crushes our site.
Speaker:So the ramp up is slower, but it kind of needs to be.
Speaker:For sure.
Speaker:And actually, why don't we look at one for workers so we can see the ramp up.
Speaker:That's
Speaker:freakingly cool.
Speaker:This is all.
Speaker:This will all be open source on the, on the Tag1 server
Speaker:it's already there.
Speaker:It has one thing that you should know about it currently is that
Speaker:there's when you spin up a Kubernetes cluster, I'll just talk about this
Speaker:while its ramping up there's network.
Speaker:There's like a backend network that all the pods communicate on.
Speaker:That is not a real network.
Speaker:It's a network.
Speaker:That's.
Speaker:Kubernetes specific and this particular speech and uses Flannel for that.
Speaker:That's even a, it's a pluggable thing.
Speaker:So you can decide what you want, your network control plan to be.
Speaker:Flannel is detecting the wrong IP address is detecting the internal IP
Speaker:address, not the public IP address.
Speaker:I haven't personally fixed that because there's a bug open about
Speaker:it with traction, but I'm going to push on the bug to fix that.
Speaker:So as it stands, if you hit, if you want to just do something like this,
Speaker:where you're going to be using the host network, which is the network of
Speaker:the VM, not the network of the pod.
Speaker:That's not an issue.
Speaker:but if you want to do something like Interpod communication,
Speaker:that will be an issue.
Speaker:So if you just pull it down, you would have to fix that.
Speaker:I'll probably dump the URI of the bug in question in the Read me
Speaker:just so that people can track that.
Speaker:So if I wanted to do something that I would need to apply
Speaker:a patch or what do we need?
Speaker:Okay.
Speaker:Yeah.
Speaker:In that issue.
Speaker:There's a little deployment you can deploy that will go in and you've
Speaker:changed the, it changes the annotation.
Speaker:On the nodes to point to the correct IP and then Flannel will update.
Speaker:So it's a bonus.
Speaker:So you had to get the external IPs manually and put them into configuration
Speaker:or
Speaker:no, I, well, so we're at 20 gigabytes,
Speaker:but while that's happening
Speaker:where's a good place for that.
Speaker:Sure.
Speaker:So I am running curls against the ECT metadata service to grab the public IP
Speaker:and the region before doing the install and then passing it to the installer.
Speaker:Which is actually a huge security vulnerability.
Speaker:If the wrong people get access to that.
Speaker:But it is very useful for setting things up.
Speaker:Like that's how, that's how an EC2 VM knows stuff about itself.
Speaker:Oh we got our first error.
Speaker:I've noticed that at about 26 gigabits per second, you start
Speaker:pulling errors every once in a while.
Speaker:Is there a corollary chart to our Fastly bill here?
Speaker:Yeah.
Speaker:Yeah, they're really is.
Speaker:We've been testing this awhile and I don't know.
Speaker:It just never occurred to me that the Fastly bill would be high, but it is.
Speaker:Man that's amazing
Speaker:I think we should be at around 25 at this.
Speaker:When you see where we're at.
Speaker:Yes, we've launched all the users.
Speaker:So we should just sustain at around here.
Speaker:And this people is how you are testing a CDN, as you can see.
Speaker:And you can see, we have, we have fewer locations near the Asia Pacific
Speaker:Pop's, but we're holding five giga bits per second, Asia Pacific, and then 10
Speaker:and 10 in Europe and North America.
Speaker:Yep.
Speaker:And just, just again, to.
Speaker:To reiterate that a little bit.
Speaker:Our Goose users are not real users.
Speaker:Like they are way faster usually.
Speaker:that's why they also create discrete benches, but every user is like
Speaker:downloading all the, I mean, we have little breaks in there.
Speaker:but all of the users also downloading all the assets.
Speaker:So When a page is loaded from Umami, like a nice recipe, which we are talking
Speaker:about then all those images are also downloaded, like, it's real browsers, like
Speaker:browsing the site or the JavaScript is not execute because you're not, doing that.
Speaker:But it's really parsing a lot.
Speaker:It's ensuring everything is correct in that.
Speaker:And you're doing this with 2000 workers on 10 nodes, 20,000 workers in total.
Speaker:Right.
Speaker:Yeah.
Speaker:users.
Speaker:Yeah.
Speaker:So 2000 users per node over 10 minutes and keep two nodes in each region.
Speaker:Yeah.
Speaker:So those 20,000 users are now hitting this site real hard.
Speaker:Probably the user doesn't click as fast as we are clicking here.
Speaker:So it's probably more like 200,000 or so.
Speaker:Generating this kind of traffic Amazing.
Speaker:Now really cool.
Speaker:Oh, then we have an error.
Speaker:And this was the error we got, but like on our old setup, I
Speaker:would have no real way to do that.
Speaker:Like we were pushing logs centrally, and by the way our bill for the
Speaker:central logging was also great
Speaker:way to like separate and look at individual workers or anything like that.
Speaker:So this is a big improvement as far as manageability.
Speaker:Yeah, Datadog probably was not as amused with that many logs.
Speaker:No, they emailed me.
Speaker:Yeah.
Speaker:Can we schedule a call to discuss your new usage.
Speaker:Sorry.
Speaker:It was just a one off.
Speaker:We need to show the world how to test Fastly.
Speaker:what's nice about Goose and what you can see here is that every
Speaker:error is very nicely reported.
Speaker:Goose just got a new patch in that also allows us to get an overview of
Speaker:all the errors that ever happened like for all the workers and everything.
Speaker:So this will be a very nice new feature.
Speaker:Um, that's launched in next release so that you don't have to.
Speaker:Look through the log of what errors you have, but you'll get an aggregated
Speaker:per error type thing report in the end.
Speaker:And I think we can essentially stop.
Speaker:we can see Fastly is handling 25 gigabytes per second easily.
Speaker:There's no, there's no real reason to make our bill higher.
Speaker:So what we can do is just
Speaker:delete the deployments.
Speaker:And just terminate them all..
Speaker:Not the nicest way to stop a load test, but yeah,
Speaker:no, no, but it is very forceful.
Speaker:Yeah.
Speaker:in theory we could have also given the manager, essentially, a stop signal.
Speaker:And then it would have given us a nice end of load test report
Speaker:which can also be in HTML.
Speaker:And this, we can show that again some other time, but yeah.
Speaker:Oh, and with this, you can, one thing you can do not right now, cause they're
Speaker:terminating, but if you want to, you can actually exec into these containers.
Speaker:So you can pull a bash prompt from any of these containers, even
Speaker:the ones in different regions.
Speaker:Not something I did.
Speaker:That's just Kubernetes.
Speaker:I mean, for sure.
Speaker:No, I, every love this new K3s is it like an acronym for something K3s?.
Speaker:Okay.
Speaker:No, it's a, so Kubernetes is the acronym for Kubernetes is K8s.
Speaker:so K3s would be their joke as it's a lighter version.
Speaker:It was really cool.
Speaker:It's literally a single binary and the install, like the install deals with
Speaker:all the prerequisites and then places the binary and spins up a systemd
Speaker:unit file that sets everything up.
Speaker:Kubernetes by default has a, like a HA multi-instance data store and
Speaker:then replace that with SQL-lite.
Speaker:Like that sort of thing.
Speaker:This feels really, really cool.
Speaker:And I think that's, that's really nice too, for multi region service.
Speaker:running so easily.
Speaker:Great.
Speaker:Yep.
Speaker:I think it's a step up from what we said before.
Speaker:Absolutely.
Speaker:Not only would it be shared before, but also what you had to do before.
Speaker:I remember SSH-ing into four different machines to start a load,
Speaker:to start a Locust test, manually Starting eight workers on each.
Speaker:And so, yeah, it's, it's really nice to have everything automated that way.
Speaker:And now I'm just bringing it down.
Speaker:Awesome.
Speaker:Thank you guys so much.
Speaker:That was really cool.
Speaker:look forward to our Fastly and Datadog bill.
Speaker:I really appreciate you guys coming back to show us how that works.
Speaker:we will do another Goose webinar in our series here in a couple of weeks.
Speaker:So please stay tuned.
Speaker:We're going to make this a regular series where we show you how
Speaker:to use different features and functionality as we release them.
Speaker:And also show you how to use the tool and profile websites and
Speaker:effectively performance tune, not just get it up and running and, and
Speaker:slamming your site with traffic.
Speaker:the links we mentioned, we'll throw into the show notes and the description.
Speaker:you can check out these other Goose talks at Tag1.com/goose
Speaker:GOOSE that's where we have links to documentation and code and all of
Speaker:the talks and videos that we've done.
Speaker:if you have any questions, please head over to the GitHub issue
Speaker:queues and ask them over there.
Speaker:If you have questions about Goose the product and how to use it,
Speaker:but if you want to get engaged and contribute, we'd love it.
Speaker:If you have input and feedback on the talk itself please contact us at.
Speaker:Tag1teamtalks@tag1.com.
Speaker:please remember to upvote share, subscribe with all your friends.
Speaker:You can check out our past Tag1 Team Talks at tag1.com/ttt for Tag Team Talks.
Speaker:Again, huge thank you, Fabian, Narayan for joining us and thank you to everybody