How to load test with Goose- Part 3: Bigger instances - Tag1 Team Talk - Tag1 Team Talks

Speaker: 00:00:00

Hello, and welcome to Tag1 Team Talks the podcast and blog of Tag1 Consulting.

Speaker: 00:00:04

Today.

Speaker: 00:00:05

We're going to be talking about distributed load testing and doing

Speaker: 00:00:08

a, how to a deep dive into running Gaggles with Tag1's, open source

Speaker: 00:00:13

Goose load testing framework.

Speaker: 00:00:16

Our goal today is to prove to you that Goose is both the most scalable load

Speaker: 00:00:19

testing framework available currently.

Speaker: 00:00:21

And the easiest to actually scale.

Speaker: 00:00:24

This is a follow-up.

Speaker: 00:00:25

Talk to one we did very recently on a similar topic.

Speaker: 00:00:29

We're going to stress the servers even more in this one.

Speaker: 00:00:33

I'm Michael Meyers, the managing director at Tag1 Consulting.

Speaker: 00:00:37

And I'm joined today by Narayan Newton Tag1's CTO and Fabian

Speaker: 00:00:42

Franz, our VP of Technology.

Speaker: 00:00:44

Fabian, can you give us just a quick background on how this is

Speaker: 00:00:47

a follow-up to our last talk?

Speaker: 00:00:50

Sure so in , our last talk, what we did is so essentially we spun up EC2

Speaker: 00:00:56

instances all over the world, but if you need to change something, you

Speaker: 00:00:59

essentially have to destroy the cluster.

Speaker: 00:01:02

Re- deploy the cluster and while recording it.

Speaker: 00:01:05

We actually ran into the problem that we had to change something and it wasn't

Speaker: 00:01:09

easy and not easily possible to do that.

Speaker: 00:01:12

And we want to change the Ulimit because with Goose, if you Put a lot of users

Speaker: 00:01:17

and you usually need to increase the ulimit that Linux comes with and then

Speaker: 00:01:22

you need to do in the VM obviously.

Speaker: 00:01:24

And we have no real control about that because we only had to start script.

Speaker: 00:01:28

So why the solution that you've presented was very straightforward, very simple

Speaker: 00:01:32

and easy to use, essentially, if you quickly want to iterate on something.

Speaker: 00:01:37

Yeah.

Speaker: 00:01:37

It can take quite a while because you have to wait for all the clusters to shut down

Speaker: 00:01:43

and really don't want any EC2 machines like hanging around for 10 years and then.

Speaker: 00:01:50

Thousands of dollars that you are paying for nothing because

Speaker: 00:01:53

you were in a load test once.

Speaker: 00:01:55

It's so important to cleanly shut down and then start up.

Speaker: 00:01:58

But that costs a lot of time and development.

Speaker: 00:02:01

Time is also costly.

Speaker: 00:02:03

So today we are having a completely new solution and am I'm totally

Speaker: 00:02:07

fascinated and excited by it.

Speaker: 00:02:09

And Narayan, please tell us more.

Speaker: 00:02:11

Yes, it was.

Speaker: 00:02:12

if you watch the last talk I spent.

Speaker: 00:02:14

unfortunate amount of it talking about why I disliked the thing I built to

Speaker: 00:02:18

spin up EC2 instances from them because I couldn't control the end points.

Speaker: 00:02:22

and then all the things that I was complaining about happened,

Speaker: 00:02:26

we had to like stop recording.

Speaker: 00:02:28

So I got annoyed and What we have today is similar to what we ran

Speaker: 00:02:35

last time, where it is still like, kind of the same Terraform tree.

Speaker: 00:02:40

And we're spinning up CoreOS nodes in various different regions, but instead of

Speaker: 00:02:45

pushing just a Goose container to each of them to run the load test it is installing

Speaker: 00:02:53

K3s which is a Kubernetes distribution.

Speaker: 00:02:57

That is.

Speaker: 00:02:58

Designed for IOT and CI and running at the edge.

Speaker: 00:03:02

It's very small.

Speaker: 00:03:03

it's not it actually, it is a full Kubernetes distribution, but it's not a

Speaker: 00:03:10

full Kubernetes distribution in that is running everything on a standard one.

Speaker: 00:03:13

they have made some changes to make it lighter and spin up faster.

Speaker: 00:03:17

so you can now instead of running Terraform apply and

Speaker: 00:03:21

it's spinning up the load test.

Speaker: 00:03:23

It spins up a multi-region Kubernetes cluster which was interesting to do

Speaker: 00:03:29

there were some oddities to doing that because each node, when you're spinning

Speaker: 00:03:33

up a node, EC2, it has an internal IP address and external IP address.

Speaker: 00:03:37

and if you're spinning up a Kubernetes distribution in a single

Speaker: 00:03:41

region, that doesn't really matter because everyone's talking to each

Speaker: 00:03:43

other via the internal IP address.

Speaker: 00:03:45

But.

Speaker: 00:03:47

When you're doing multi-region, everyone's talking to each other via

Speaker: 00:03:50

the external IP address and that IP address does not appear on the VM at all.

Speaker: 00:03:55

So that was interesting, I will share my screen.

Speaker: 00:04:01

before we started, I, where we were last time is basically spinning up 10 nodes.

Speaker: 00:04:07

two nodes in each region, five regions.

Speaker: 00:04:08

I did that again, but with the new truss..

Speaker: 00:04:14

Okay.

Speaker: 00:04:15

So now we are on the manager node.

Speaker: 00:04:20

Cube control gets set up automatically on the manager node.

Speaker: 00:04:24

So.

Speaker: 00:04:28

There is our cluster as it stands currently.

Speaker: 00:04:31

So I just ran a get nodes wide so I can see extended fields.

Speaker: 00:04:35

And you can see that we have the control plane, which is what we are on currently.

Speaker: 00:04:40

And then all of our worker nodes, you can see the internal IP addresses

Speaker: 00:04:43

and the external IP addresses.

Speaker: 00:04:45

If you look at one of these, like, let's look at probably this one.

Speaker: 00:04:58

You can see it's even tagged by the region its in.

Speaker: 00:05:01

So we're fetching the region that spin up and tagging

Speaker: 00:05:03

whatever region this node is in.

Speaker: 00:05:05

Yeah, these are just boring regions, but there are more

Speaker: 00:05:08

interesting regions as well.

Speaker: 00:05:11

So to run the load test,

Speaker: 00:05:19

I have a little YAML directory here, and this is what we're

Speaker: 00:05:24

doing instead of pushing the Docker image to each CoreOS node.

Speaker: 00:05:32

Okay.

Speaker: 00:05:32

Should.

Speaker: 00:05:33

And just letting it run without any control, is that now you spring up the

Speaker: 00:05:36

cluster and you can submit these jobs.

Speaker: 00:05:38

So this is the manager job it's going to ________ workers, but we want 10,

Speaker: 00:05:46

but real quick.

Speaker: 00:05:48

I got lost a little bit in the translation of things, so we have.

Speaker: 00:05:52

EC2 instances and they are now having this huge Kubernetes network.

Speaker: 00:05:58

And now what are we using to now deploy Goose or is Goose already there?

Speaker: 00:06:03

Goose is not there.

Speaker: 00:06:04

So this, when I'm copying up to the manager, node is our deployment of Goose.

Speaker: 00:06:09

So this is a deployment telling Kubernetes.

Speaker: 00:06:12

To spin up one replica of the Goose manager and so this is going to be the

Speaker: 00:06:18

manager that all the workers connect to.

Speaker: 00:06:21

Excellent.

Speaker: 00:06:22

Perfect.

Speaker: 00:06:23

So we are going to create that

Speaker: 00:06:31

and you can see it creating here.

Speaker: 00:06:34

And if we look back at that deployment, you'll see that we're, we have a node

Speaker: 00:06:40

selector to tell Kubernetes that I want this to run on the manager node.

Speaker: 00:06:45

I don't want it to run on the worker nodes because this is the

Speaker: 00:06:48

management instance of Goose.

Speaker: 00:06:51

Excellent.

Speaker: 00:06:51

So how can I now?

Speaker: 00:06:52

Um, can I now look.

Speaker: 00:06:54

At the as soon as the Kubernetes, one is open, can I now look

Speaker: 00:06:59

that , uh, is waiting for nodes?

Speaker: 00:07:00

Can I like see the output of that as well?

Speaker: 00:07:04

You can, once it's done creating the container.

Speaker: 00:07:07

So what it's doing right now is it's pulling the container down

Speaker: 00:07:10

from our container registry.

Speaker: 00:07:13

How long does container deployment usually take with Kubernetes?

Speaker: 00:07:16

It scales by the size of the container.

Speaker: 00:07:18

And our container is actually quite large because I have not put the

Speaker: 00:07:22

effort and making it smaller yet.

Speaker: 00:07:24

Okay.

Speaker: 00:07:24

So just downloading a few gigabytes of data just for the distribution and

Speaker: 00:07:31

all the rest dependencies, et cetera.

Speaker: 00:07:34

Exactly.

Speaker: 00:07:36

Okay.

Speaker: 00:07:37

And now this is our Goose worker same container, but different

Speaker: 00:07:41

arguments to the container, obviously.

Speaker: 00:07:42

we have some pod anti affinity rules here, which are kind of interesting.

Speaker: 00:07:47

Basically.

Speaker: 00:07:47

This is me saying to Kubernetes.

Speaker: 00:07:49

I don't want you to schedule this on any node that already has a worker running,

Speaker: 00:07:56

and I don't want you to schedule it on any node that has the manager running.

Speaker: 00:08:00

So it will distribute it to every single node so that

Speaker: 00:08:04

there won't be an inactive one.

Speaker: 00:08:11

Yeah.

Speaker: 00:08:11

You wanted to see this.

Speaker: 00:08:12

So, there.

Speaker: 00:08:14

Before I start the workers, the manager is not running and we can do a group

Speaker: 00:08:18

control logs and there are logs.

Speaker: 00:08:22

Nice.

Speaker: 00:08:23

and it's waiting for 10 workers.

Speaker: 00:08:28

So let's create the workers.

Speaker: 00:08:34

Oh,

Speaker: 00:08:36

I fixed this on the other one.

Speaker: 00:08:37

So what it's complaining about is these pod affinity rules.

Speaker: 00:08:41

It needs a topology key for each one.

Speaker: 00:08:44

And so that one's responsible.

Speaker: 00:08:47

So then this typology essentially meant that it's it's per host

Speaker: 00:08:51

name, not per something else.

Speaker: 00:08:53

Okay.

Speaker: 00:08:53

Per

Speaker: 00:08:53

host name instead of per zone, per region, per rack.

Speaker: 00:08:57

So I could also decide to.

Speaker: 00:09:00

Deploy, just one Kubernetes to each region, essentially.

Speaker: 00:09:07

And now we have our workers spinning up.

Speaker: 00:09:09

And so this is what was happening last time as well.

Speaker: 00:09:13

It's just, this was happening when you ran Terraform apply.

Speaker: 00:09:17

So it would bring up all the nodes and then each node would be doing this,

Speaker: 00:09:21

but without our direct interaction,

Speaker: 00:09:24

I think I like the manager way more.

Speaker: 00:09:27

It's nice to have a little bit of control and to easily take a look at logs.

Speaker: 00:09:32

Yeah.

Speaker: 00:09:40

We have already started to send users.

Speaker: 00:09:42

That's great.

Speaker: 00:09:44

Yep.

Speaker: 00:09:44

as part of this, the ulimit issues we were hitting cause in the old

Speaker: 00:09:49

one, we weren't changing the ulimit.

Speaker: 00:09:50

So we had a maximum open file descriptors 1,024.

Speaker: 00:09:54

we are actually inheriting a ulimit fix that K3s pushes in when

Speaker: 00:09:58

it installs, which is helpful.

Speaker: 00:10:04

That's indeed helpful if they have already solved the problem for us,

Speaker: 00:10:10

so that's starting up you can see.

Speaker: 00:10:11

These are workers transitioning to the running state.

Speaker: 00:10:15

we can look at the logs of the workers as well.

Speaker: 00:10:18

Which is kind of cool.

Speaker: 00:10:19

If you think that these workers are in like various regions, I

Speaker: 00:10:24

can just run something to pull logs from like the central Europe

Speaker: 00:10:30

Japan is cooler than central Europe.

Speaker: 00:10:32

Yeah,

Speaker: 00:10:33

that's true.

Speaker: 00:10:36

Okay.

Speaker: 00:10:36

And our load tests should be starting.

Speaker: 00:10:41

I think it was around 15 seconds right I'm just coming in.

Speaker: 00:10:45

Yeah, there we go.

Speaker: 00:10:47

Oh my God.

Speaker: 00:10:55

how fast do we have to ramp up right now?

Speaker: 00:10:57

I had a hundred and you start going up and it, it crushes our site.

Speaker: 00:11:02

So the ramp up is slower, but it kind of needs to be.

Speaker: 00:11:07

For sure.

Speaker: 00:11:10

And actually, why don't we look at one for workers so we can see the ramp up.

Speaker: 00:11:17

That's

Speaker: 00:11:18

freakingly cool.

Speaker: 00:11:20

This is all.

Speaker: 00:11:20

This will all be open source on the, on the Tag1 server

Speaker: 00:11:24

it's already there.

Speaker: 00:11:25

It has one thing that you should know about it currently is that

Speaker: 00:11:31

there's when you spin up a Kubernetes cluster, I'll just talk about this

Speaker: 00:11:34

while its ramping up there's network.

Speaker: 00:11:36

There's like a backend network that all the pods communicate on.

Speaker: 00:11:40

That is not a real network.

Speaker: 00:11:42

It's a network.

Speaker: 00:11:43

That's.

Speaker: 00:11:44

Kubernetes specific and this particular speech and uses Flannel for that.

Speaker: 00:11:47

That's even a, it's a pluggable thing.

Speaker: 00:11:49

So you can decide what you want, your network control plan to be.

Speaker: 00:11:52

Flannel is detecting the wrong IP address is detecting the internal IP

Speaker: 00:11:56

address, not the public IP address.

Speaker: 00:11:58

I haven't personally fixed that because there's a bug open about

Speaker: 00:12:00

it with traction, but I'm going to push on the bug to fix that.

Speaker: 00:12:04

So as it stands, if you hit, if you want to just do something like this,

Speaker: 00:12:07

where you're going to be using the host network, which is the network of

Speaker: 00:12:10

the VM, not the network of the pod.

Speaker: 00:12:14

That's not an issue.

Speaker: 00:12:15

but if you want to do something like Interpod communication,

Speaker: 00:12:18

that will be an issue.

Speaker: 00:12:19

So if you just pull it down, you would have to fix that.

Speaker: 00:12:21

I'll probably dump the URI of the bug in question in the Read me

Speaker: 00:12:25

just so that people can track that.

Speaker: 00:12:29

So if I wanted to do something that I would need to apply

Speaker: 00:12:31

a patch or what do we need?

Speaker: 00:12:34

Okay.

Speaker: 00:12:34

Yeah.

Speaker: 00:12:37

In that issue.

Speaker: 00:12:38

There's a little deployment you can deploy that will go in and you've

Speaker: 00:12:41

changed the, it changes the annotation.

Speaker: 00:12:44

On the nodes to point to the correct IP and then Flannel will update.

Speaker: 00:12:50

So it's a bonus.

Speaker: 00:12:51

So you had to get the external IPs manually and put them into configuration

Speaker: 00:12:56

or

Speaker: 00:12:56

no, I, well, so we're at 20 gigabytes,

Speaker: 00:13:04

but while that's happening

Speaker: 00:13:05

where's a good place for that.

Speaker: 00:13:07

Sure.

Speaker: 00:13:09

So I am running curls against the ECT metadata service to grab the public IP

Speaker: 00:13:18

and the region before doing the install and then passing it to the installer.

Speaker: 00:13:26

Which is actually a huge security vulnerability.

Speaker: 00:13:29

If the wrong people get access to that.

Speaker: 00:13:32

But it is very useful for setting things up.

Speaker: 00:13:35

Like that's how, that's how an EC2 VM knows stuff about itself.

Speaker: 00:13:38

Oh we got our first error.

Speaker: 00:13:40

I've noticed that at about 26 gigabits per second, you start

Speaker: 00:13:43

pulling errors every once in a while.

Speaker: 00:13:47

Is there a corollary chart to our Fastly bill here?

Speaker: 00:13:51

Yeah.

Speaker: 00:13:52

Yeah, they're really is.

Speaker: 00:13:54

We've been testing this awhile and I don't know.

Speaker: 00:13:57

It just never occurred to me that the Fastly bill would be high, but it is.

Speaker: 00:14:05

Man that's amazing

Speaker: 00:14:06

I think we should be at around 25 at this.

Speaker: 00:14:10

When you see where we're at.

Speaker: 00:14:12

Yes, we've launched all the users.

Speaker: 00:14:17

So we should just sustain at around here.

Speaker: 00:14:20

And this people is how you are testing a CDN, as you can see.

Speaker: 00:14:26

And you can see, we have, we have fewer locations near the Asia Pacific

Speaker: 00:14:31

Pop's, but we're holding five giga bits per second, Asia Pacific, and then 10

Speaker: 00:14:36

and 10 in Europe and North America.

Speaker: 00:14:38

Yep.

Speaker: 00:14:39

And just, just again, to.

Speaker: 00:14:41

To reiterate that a little bit.

Speaker: 00:14:43

Our Goose users are not real users.

Speaker: 00:14:45

Like they are way faster usually.

Speaker: 00:14:47

that's why they also create discrete benches, but every user is like

Speaker: 00:14:50

downloading all the, I mean, we have little breaks in there.

Speaker: 00:14:53

but all of the users also downloading all the assets.

Speaker: 00:14:55

So When a page is loaded from Umami, like a nice recipe, which we are talking

Speaker: 00:15:01

about then all those images are also downloaded, like, it's real browsers, like

Speaker: 00:15:06

browsing the site or the JavaScript is not execute because you're not, doing that.

Speaker: 00:15:10

But it's really parsing a lot.

Speaker: 00:15:12

It's ensuring everything is correct in that.

Speaker: 00:15:14

And you're doing this with 2000 workers on 10 nodes, 20,000 workers in total.

Speaker: 00:15:19

Right.

Speaker: 00:15:20

Yeah.

Speaker: 00:15:22

users.

Speaker: 00:15:24

Yeah.

Speaker: 00:15:24

So 2000 users per node over 10 minutes and keep two nodes in each region.

Speaker: 00:15:30

Yeah.

Speaker: 00:15:30

So those 20,000 users are now hitting this site real hard.

Speaker: 00:15:35

Probably the user doesn't click as fast as we are clicking here.

Speaker: 00:15:39

So it's probably more like 200,000 or so.

Speaker: 00:15:43

Generating this kind of traffic Amazing.

Speaker: 00:15:47

Now really cool.

Speaker: 00:15:48

Oh, then we have an error.

Speaker: 00:15:49

And this was the error we got, but like on our old setup, I

Speaker: 00:15:53

would have no real way to do that.

Speaker: 00:15:55

Like we were pushing logs centrally, and by the way our bill for the

Speaker: 00:15:59

central logging was also great

Speaker: 00:16:05

way to like separate and look at individual workers or anything like that.

Speaker: 00:16:10

So this is a big improvement as far as manageability.

Speaker: 00:16:13

Yeah, Datadog probably was not as amused with that many logs.

Speaker: 00:16:17

No, they emailed me.

Speaker: 00:16:22

Yeah.

Speaker: 00:16:23

Can we schedule a call to discuss your new usage.

Speaker: 00:16:29

Sorry.

Speaker: 00:16:29

It was just a one off.

Speaker: 00:16:30

We need to show the world how to test Fastly.

Speaker: 00:16:33

what's nice about Goose and what you can see here is that every

Speaker: 00:16:36

error is very nicely reported.

Speaker: 00:16:38

Goose just got a new patch in that also allows us to get an overview of

Speaker: 00:16:43

all the errors that ever happened like for all the workers and everything.

Speaker: 00:16:46

So this will be a very nice new feature.

Speaker: 00:16:49

Um, that's launched in next release so that you don't have to.

Speaker: 00:16:52

Look through the log of what errors you have, but you'll get an aggregated

Speaker: 00:16:57

per error type thing report in the end.

Speaker: 00:17:01

And I think we can essentially stop.

Speaker: 00:17:03

we can see Fastly is handling 25 gigabytes per second easily.

Speaker: 00:17:08

There's no, there's no real reason to make our bill higher.

Speaker: 00:17:14

So what we can do is just

Speaker: 00:17:17

delete the deployments.

Speaker: 00:17:18

And just terminate them all..

Speaker: 00:17:24

Not the nicest way to stop a load test, but yeah,

Speaker: 00:17:28

no, no, but it is very forceful.

Speaker: 00:17:33

Yeah.

Speaker: 00:17:34

in theory we could have also given the manager, essentially, a stop signal.

Speaker: 00:17:39

And then it would have given us a nice end of load test report

Speaker: 00:17:42

which can also be in HTML.

Speaker: 00:17:44

And this, we can show that again some other time, but yeah.

Speaker: 00:17:47

Oh, and with this, you can, one thing you can do not right now, cause they're

Speaker: 00:17:52

terminating, but if you want to, you can actually exec into these containers.

Speaker: 00:17:57

So you can pull a bash prompt from any of these containers, even

Speaker: 00:18:01

the ones in different regions.

Speaker: 00:18:06

Not something I did.

Speaker: 00:18:07

That's just Kubernetes.

Speaker: 00:18:10

I mean, for sure.

Speaker: 00:18:11

No, I, every love this new K3s is it like an acronym for something K3s?.

Speaker: 00:18:17

Okay.

Speaker: 00:18:18

No, it's a, so Kubernetes is the acronym for Kubernetes is K8s.

Speaker: 00:18:22

so K3s would be their joke as it's a lighter version.

Speaker: 00:18:28

It was really cool.

Speaker: 00:18:29

It's literally a single binary and the install, like the install deals with

Speaker: 00:18:33

all the prerequisites and then places the binary and spins up a systemd

Speaker: 00:18:38

unit file that sets everything up.

Speaker: 00:18:39

Kubernetes by default has a, like a HA multi-instance data store and

Speaker: 00:18:45

then replace that with SQL-lite.

Speaker: 00:18:47

Like that sort of thing.

Speaker: 00:18:50

This feels really, really cool.

Speaker: 00:18:51

And I think that's, that's really nice too, for multi region service.

Speaker: 00:18:56

running so easily.

Speaker: 00:18:57

Great.

Speaker: 00:18:58

Yep.

Speaker: 00:18:59

I think it's a step up from what we said before.

Speaker: 00:19:02

Absolutely.

Speaker: 00:19:02

Not only would it be shared before, but also what you had to do before.

Speaker: 00:19:06

I remember SSH-ing into four different machines to start a load,

Speaker: 00:19:10

to start a Locust test, manually Starting eight workers on each.

Speaker: 00:19:17

And so, yeah, it's, it's really nice to have everything automated that way.

Speaker: 00:19:21

And now I'm just bringing it down.

Speaker: 00:19:27

Awesome.

Speaker: 00:19:27

Thank you guys so much.

Speaker: 00:19:29

That was really cool.

Speaker: 00:19:30

look forward to our Fastly and Datadog bill.

Speaker: 00:19:34

I really appreciate you guys coming back to show us how that works.

Speaker: 00:19:37

we will do another Goose webinar in our series here in a couple of weeks.

Speaker: 00:19:42

So please stay tuned.

Speaker: 00:19:43

We're going to make this a regular series where we show you how

Speaker: 00:19:46

to use different features and functionality as we release them.

Speaker: 00:19:50

And also show you how to use the tool and profile websites and

Speaker: 00:19:54

effectively performance tune, not just get it up and running and, and

Speaker: 00:19:58

slamming your site with traffic.

Speaker: 00:19:59

the links we mentioned, we'll throw into the show notes and the description.

Speaker: 00:20:03

you can check out these other Goose talks at Tag1.com/goose

Speaker: 00:20:07

GOOSE that's where we have links to documentation and code and all of

Speaker: 00:20:12

the talks and videos that we've done.

Speaker: 00:20:14

if you have any questions, please head over to the GitHub issue

Speaker: 00:20:19

queues and ask them over there.

Speaker: 00:20:20

If you have questions about Goose the product and how to use it,

Speaker: 00:20:23

but if you want to get engaged and contribute, we'd love it.

Speaker: 00:20:26

If you have input and feedback on the talk itself please contact us at.

Speaker: 00:20:30

Tag1teamtalks@tag1.com.

Speaker: 00:20:33

please remember to upvote share, subscribe with all your friends.

Speaker: 00:20:37

You can check out our past Tag1 Team Talks at tag1.com/ttt for Tag Team Talks.

Speaker: 00:20:43

Again, huge thank you, Fabian, Narayan for joining us and thank you to everybody

Share Episode

Shownotes

Transcripts

Follow

Links

Chapters

Video

More from YouTube