Artwork for podcast Tag1 Team Talks | The Tag1 Consulting Podcast
How to Load Test with Goose - Part 2: Running a Gaggle
Episode 7312th July 2021 • Tag1 Team Talks | The Tag1 Consulting Podcast • Tag1 Consulting, Inc.
00:00:00 00:20:38

Share Episode

Shownotes

In this second part of our team talk series on live load testing with Goose, we focus on demonstrating load testing using a Gaggle. A Gaggle is a distributed load test using running Goose from one or more servers. Here, we’re testing with 20,000 users using ten Workers and a Manager process on services spun up using Terraform. 

CEO Jeremy Andrews, the creator of Goose; Fabian Franz, VP of Software Engineering;  CTO Narayan Newton, and Managing Director Michael Meyers demonstrate running a Goose Gaggle and discuss how variations on these load test change testing results, and what you can expect from a Gaggle. Our goal is to prove to you that Goose is both the most scalable load testing framework currently available, and the easiest to scale.

Narayan is a key member of the Drupal.org infrastructure team, responsible for ensuring the site stays up under load or attack. Load testing is a critical part of ensuring Drupal.org, or any website’s continued success.

For more Goose content, see Goose Podcasts, Blogs, Presentations, & more!


Transcripts

Speaker:

Hello, and welcome to Tag1 Team Talks, the podcast and blog of Tag1 Consulting.

Speaker:

Today, we're going to be doing distributed load testing how to: a deep dive into

Speaker:

running a Gaggle with Tag1's, open source Goose load testing framework.

Speaker:

Our goal is to prove to you that Goose is both the most scalable load

Speaker:

testing framework currently available.

Speaker:

And the easiest to scale.

Speaker:

We're going to show you how to run a distributed load, test yourself.

Speaker:

And we're going to provide you with lots of code and examples to make it easy and

Speaker:

possible for you to do this on your own.

Speaker:

I'm Michael Meyers, the managing director at Tag1, and joining

Speaker:

me today is a star-studded cast.

Speaker:

We have Jeremy Andrews, the founder and CEO of Tag1.

Speaker:

Who's also the original creator of Goose, Fabian Franz, our VP of Technology.

Speaker:

Who's made major contributions to Goose, especially around

Speaker:

performance and scalability.

Speaker:

And Narayan Newton, our CTO who has set up and put together all the

Speaker:

infrastructure that we're going to be using to run these load tests.

Speaker:

Jeremy, why don't you take it away?

Speaker:

Give us an overview of what we're going to be covering and let's jump into it.

Speaker:

Yeah.

Speaker:

So last time we were exploring with setting up a load test from a single

Speaker:

server and confirmed that Goose makes great use of that server.

Speaker:

it leverages all the CPU's and ultimately tends to get as far as it

Speaker:

can until the uplink slows it down.

Speaker:

So today what we're going to do is.

Speaker:

Use a feature of Goose called a Gaggle which is a distributed load test.

Speaker:

If you're familiar with Locust, it is like a swarm.

Speaker:

and the way it, the way that this works with Goose you have a manager

Speaker:

process that you kick off and you say, I want to simulate 20,000 users and I'm

Speaker:

expecting 10 workers to do this load.

Speaker:

the manager process prepares things and, and all the workers then connect

Speaker:

in through a TCP port and it sends each of them a batch of users to run.

Speaker:

And then each of them the manager coordinates a start, each of the

Speaker:

workers start at the same time.

Speaker:

And And then they send their statistics back to the managers so that you can

Speaker:

actually see what happened in the end.

Speaker:

what this nicely solves is if you're uplink can only do so much traffic,

Speaker:

or if you want traffic coming from multiple regions around the world you

Speaker:

could let Goose manage that for you in all of these different servers.

Speaker:

So today Narayan has set up a pretty cool test where we're going to

Speaker:

be spinning up a lot of workers.

Speaker:

and he could talk about how many each one is not going to be working too hard.

Speaker:

They'll run maybe a thousand users per server.

Speaker:

which means it'll be at least 50% idle.

Speaker:

It won't be maxing out the uplink on any given server.

Speaker:

but in spite of that, we're going to show that working together in a Gaggle

Speaker:

we can generate a huge amount of load.

Speaker:

so now Narayan, if you can talk about what you've set up here.

Speaker:

Sure.

Speaker:

so what I built today is basically a simplistic Terraform tree.

Speaker:

what is interesting about this is that we wanted to distribute the load between

Speaker:

different regions and for those people that have used Terraform in the past

Speaker:

that can be slightly odd in that you can only set one region for each AWS provider

Speaker:

that Terraform uses to spin things up.

Speaker:

So how we've done, this is defined multiple providers, one

Speaker:

for each region and a module that spins up our region workers.

Speaker:

And we basically initialize multiple versions of the module

Speaker:

passing each a different region.

Speaker:

So in the default test, we spin up 10 worker nodes in various regions.

Speaker:

the Western part of the United States, the Eastern part of the

Speaker:

United States, Ireland Frankfurt.

Speaker:

India and Japan with how the test currently works.

Speaker:

It's the load testing truss, which is what we decided to call it.

Speaker:

it's a little limited because once you start it, you can't really

Speaker:

interact with the workers themselves.

Speaker:

They start up, they pull down Goose and they run the test.

Speaker:

then next revision of this would be something that has a clustering agent

Speaker:

between the workers to, so that you can actually interact with the workers

Speaker:

after they start it gets very annoying to have to run Terraform, to stand up

Speaker:

these VMs all over the world, and then you want to make a change to them.

Speaker:

You have to destroy all of them and then relaunch them which isn't terrible.

Speaker:

But as a testing, sequence.

Speaker:

It adds a lot of time, just because it takes time to

Speaker:

destroy and recreate these VMs.

Speaker:

so the next revision of this would be something other than Goose,

Speaker:

creating a cluster of these VMs.

Speaker:

how it currently works is that we're using Fedora CoreOS so that we have

Speaker:

a consistent base at each location.

Speaker:

And so I could only send it a single file for initialization.

Speaker:

And then Fedora CoreOS pulls down a container that has the Goose load test

Speaker:

and a container that has a logging agent so that we can monitor the workers

Speaker:

and send all the logs from the Goose agents back to a central location

Speaker:

I had a quick question, so Narayan , um, the basic setup is that we have EC2

Speaker:

instances, like on AWS, and then we run containers like normal Kubernetes

Speaker:

like on them, or how is it working?

Speaker:

It's using Docker.

Speaker:

So that is the big thing that I want to improve.

Speaker:

And I almost got there before today.

Speaker:

what would be nicer is if we could use one of the IOT distributions or Kubernetes

Speaker:

at the edge distributions to run a very slim version of Kubernetes on each

Speaker:

worker node so that we get a few things.

Speaker:

One is cluster access, so we can actually interact with the clusters spread

Speaker:

load, run, multiple instances of Goose.

Speaker:

it would be interesting to pack multiple instances of Goose on things

Speaker:

like the higher end and also be able to actually edit the cluster after

Speaker:

it's up and not have to destroy it and recreate it each time.

Speaker:

The other thing is to get containerd and not Docker.

Speaker:

just because there are some issues that you can hit with that.

Speaker:

as it stands right now, CoreOS ships with Docker running, and that's how

Speaker:

you interact with it for the most part is a systemctl Docker, but you could

Speaker:

also use Podman but I ran into issues with that for redirecting the logs.

Speaker:

So we are actually using Docker itself and Docker is just running

Speaker:

the container as you would in a local development environment.

Speaker:

So what we are missing from standard kubernetes deployment thing that

Speaker:

we would normally have is the ability to deploy a new container.

Speaker:

You were saying that if I want to deploy a new container versus

Speaker:

simplistic infrastructure right now, I need to shut down the EC2

Speaker:

instance and then start them up again.

Speaker:

Okay.

Speaker:

So that's, that's the, so like what I did when before this test, Jeremy released

Speaker:

a new branch with some changes to make this load test faster as on startup.

Speaker:

what I did to deploy that is run Terraform destroy, wait for it, to kill

Speaker:

all their VMs across the world, and then Terraform apply and wait for it to

Speaker:

recreate all those VMs across the world.

Speaker:

And like that is.

Speaker:

management style, honestly, but in this specific case, because we're

Speaker:

doing sometimes micro iterations, it can get really annoying.

Speaker:

Yeah, for sure.

Speaker:

No, no, that makes perfect sense.

Speaker:

I just want to understand, because I was like, in this container world, you

Speaker:

can just deploy a new container, but obviously you need a manager for that.

Speaker:

Yes.

Speaker:

Yes.

Speaker:

I could totally deploy a new container.

Speaker:

So what I could do is have Terraform output the list of IPs, and then I can SSH

Speaker:

to each of them and pull a new container.

Speaker:

But at that point,

Speaker:

But seriously, there's another Git repository that I have started.

Speaker:

The version of this that uses a distribution of Kubernetes is called

Speaker:

K3s that is designed for CI systems and IOT and deployments to the edge.

Speaker:

And it's a It's a single binary version of Kubernetes where everything is

Speaker:

wrapped into a single binary and starts on edge nodes and then can connect

Speaker:

them all together and so we could have a multi-region global cluster

Speaker:

of this little Kubernetes agents.

Speaker:

And then we could spin up Gooses on that.

Speaker:

And that I think will actually work.

Speaker:

You totally blew my mind.

Speaker:

So now you've just signed up for follow up to show that

Speaker:

because that's, I mean, that's, that's what you want actually, but now I'm

Speaker:

really curious, how does this Terraform configuration actually look, can,

Speaker:

can you share a little bit about it?

Speaker:

So this is the current tree.

Speaker:

If everyone can see that it's pretty simplistic.

Speaker:

So this is the main file that gets loaded.

Speaker:

And then for everyone, there's a module that is named after its region.

Speaker:

They're all hitting that same actual module is just different

Speaker:

revisions of this module.

Speaker:

And then they'll take a worker count and their region and their provider

Speaker:

and the provider is what is actually separating them into regions.

Speaker:

And then if you look at the region worker, which is where most of these

Speaker:

things are happening, There's a variables thing, which is interesting

Speaker:

because I have to define an AMI map because every region has a different

Speaker:

AMI because the regions are disparate.

Speaker:

Like there's no, there's no consensus building between these regions for images.

Speaker:

So one of the reasons I picked CoreOS is because it exists in each of these regions

Speaker:

and can handle a single start-up file.

Speaker:

when when we do the K3s version of this K3s kind of run on Ubuntu

Speaker:

of, and Ubuntu of obviously exists in all these regions as well.

Speaker:

but I'll still have to do something like this, or there's another way I can do it,

Speaker:

but this was the way to do it for CoreOS.

Speaker:

And then we set, instance type, this is this a default.

Speaker:

And then the main version of this is very simple.

Speaker:

we initialized our key pair, cause I want to be able to SSH into these instances at

Speaker:

some point and upload it to each region.

Speaker:

We initialize a simple security group that allows me to SSH in to each region.

Speaker:

And then a simple instance that doesn't really have much because it's, it

Speaker:

doesn't even have a large root device cause we're not using it at all.

Speaker:

Basically we're just spinning up a single container and then pushing the logs to

Speaker:

Datadog, which is our central log agent.

Speaker:

So even the logs aren't being written locally on that we

Speaker:

associated a public IP address.

Speaker:

We spin up the AMI, we look up which AMI we should use based on our region.

Speaker:

and then we output the worker address.

Speaker:

So the other part of this is the manager.

Speaker:

The only real difference in this, we basically spent out the exact same

Speaker:

way is we also allow the Goose ports, which is 5115, and we spin up a DNS

Speaker:

record that points to our manager because that DNS record is what all the

Speaker:

region workers are going to point at.

Speaker:

Um, and we make use of the fact that they're all using route 53.

Speaker:

So this update propagates really quickly.

Speaker:

And that's basically that it's pretty simple.

Speaker:

each VM is running.

Speaker:

Sorry, go ahead.

Speaker:

where do you actually put in the Goose part?

Speaker:

Because I've seen the VM.

Speaker:

Yep.

Speaker:

So each CoreOS VM it can take a ignition file.

Speaker:

The idea behind CoreOS is it was a project to simplify infrastructure

Speaker:

that was based on containers.

Speaker:

It became an underlying part of a lot of Kubernetes deployments

Speaker:

because it's basically read only in essence on a configuration level.

Speaker:

It even can auto update itself.

Speaker:

it's a very interesting way of dealing with an operating system.

Speaker:

It its entire concept is you don't really interact with it outside of containers.

Speaker:

It's just a stable base for containers that remain secure can auto update is

Speaker:

basically read only in its essence and it takes these ignition files that define

Speaker:

how it should set itself up on first boot.

Speaker:

So if we look at one of these ignition files,

Speaker:

okay.

Speaker:

we can see that it's basically YAML.

Speaker:

And we defined the SSH key.

Speaker:

We want to get pushed.

Speaker:

We define an /etc/hosts file to push.

Speaker:

We then define some systemd units, which include turning off SELinux

Speaker:

because we don't want to deal with that on short-lived workers.

Speaker:

And then we define the Goose service, which pulls down the image.

Speaker:

And right here actually starts Goose.

Speaker:

this is mostly defining the log driver, which ships logs back to

Speaker:

Datadog, the log driver, the actual logging agent is started here.

Speaker:

but then like, this is one of the workers.

Speaker:

So we pulled the temp, umami branch of Goose.

Speaker:

We start it up, set it to worker.

Speaker:

Point it to the manager host set it to be somewhat verbose set the log

Speaker:

driver to be Datadog startup data dogs that we get metrics in the logs.

Speaker:

And then that's just how it runs.

Speaker:

And this will restart over and over and over again.

Speaker:

So you can actually run multiple tests with the same infrastructure.

Speaker:

You just have to restart Goose on the manager and then the workers will

Speaker:

kill themselves and then restart.

Speaker:

And so you get this plan, where it shows you where all the

Speaker:

instances it's going to spin up.

Speaker:

It's actually fairly long just because there are a lot of params

Speaker:

for each EC2 instance that we're spinning up 11 of them, 10 plus

Speaker:

the manager, you say that's fine.

Speaker:

And it goes,

Speaker:

and I will stop sharing my screen now is this is going to take a bit.

Speaker:

So is this already doing something now.

Speaker:

Yes.

Speaker:

And this is, you're probably going to see one of the quirks.

Speaker:

and this is another thing I dislike about this.

Speaker:

Because we're using CoreOS . These are all coming up on an outdated AMI and

Speaker:

they're all going to reboot right there

Speaker:

because they come up, they start pulling the Goose container and

Speaker:

then they start the update process and they're not doing anything.

Speaker:

So at that point, they think it's safe to update.

Speaker:

And so they update and reboot.

Speaker:

it's somewhat cool that that has no impact on anything like the entire infrastructure

Speaker:

comes up, updates itself, reboots.

Speaker:

Then it continues on with what it's doing, but it's another

Speaker:

little annoyance that I just don't.

Speaker:

You spin up this infrastructure and you don't really have

Speaker:

a ton of control over it.

Speaker:

And so this is the logs of the manager process of Goose, and it's just

Speaker:

waiting for its workers to connect.

Speaker:

They're all, they've all updated, rebooted and Goose is starting on them.

Speaker:

As you can see, eight of them had completed that process.

Speaker:

is the, you know, all of this, the stuff that you put together here is

Speaker:

this going to be available open source for folks to download and leverage?

Speaker:

Yep.

Speaker:

Awesome.

Speaker:

It's all online.

Speaker:

On our Tag1 Consulting GitHub organization and the K3s version will be as well.

Speaker:

And that's the one I'd recommend you use.

Speaker:

This

Speaker:

one's real annoying.

Speaker:

I know I keep going on about it, but like, this is how it skunkworks projects work.

Speaker:

You make the first revision and you hate it and then you

Speaker:

decide to never do that again.

Speaker:

And then you make the second revision.

Speaker:

Okay.

Speaker:

This is starting.

Speaker:

Now I'm going to switch my screen over to the web browser

Speaker:

so we can see what it's doing.

Speaker:

Sure.

Speaker:

Great.

Speaker:

the logs that we're seeing there are they coming from the Datadog

Speaker:

or just the manager director.

Speaker:

And so

Speaker:

that was a direct connection to the manager.

Speaker:

If we go over to Datadog here, there, these are going to be the logs.

Speaker:

As you can see, the host is just like what an EC2 host name looks

Speaker:

like, and they're all changing, but we're getting logs from every agent.

Speaker:

As well as the worker.

Speaker:

You can see they're launching.

Speaker:

If we go back to Fastly, we can see that they're getting global traffic.

Speaker:

So we're getting traffic on the West coast, the East

Speaker:

coast, Ireland, Frankfurt, and

Speaker:

Mumbai,

Speaker:

and the bandwidth will just keep ramping up from here

Speaker:

For the Datadog is that way to also filter by the manager, like,

Speaker:

sure.

Speaker:

this is the live tail.

Speaker:

We'll go to the past 15 minutes and then you can go service Goose.

Speaker:

And then we have worker and manager, so I can do all my worker.

Speaker:

And that's sorry, only manager.

Speaker:

The manager is pretty quiet.

Speaker:

The workers are not.

Speaker:

You must've disabled.

Speaker:

displaying metrics regularly.

Speaker:

Cause I would have expected on the server to see that.

Speaker:

if I did, I did not intend to, but I probably did.

Speaker:

Can we, is it easy to quickly see what command you passed in or not to go back

Speaker:

there from where you're at right now?

Speaker:

It's in Terraform, I think.

Speaker:

It is all set here.

Speaker:

So it

Speaker:

must be interesting.

Speaker:

I have to figure out why you're not getting statistics on the

Speaker:

manager because you should be getting statistics on the manager.

Speaker:

Is this the log you're tailing or is this what's verbosly put out to the screen?

Speaker:

This is

Speaker:

what is put out to the screen.

Speaker:

Yeah.

Speaker:

Interesting.

Speaker:

Okay.

Speaker:

I would have expected statistics every 30

Speaker:

seconds.

Speaker:

So what's kind of interesting is you can expand this in Fastly and see we're

Speaker:

doing significantly less traffic in Asia Pacific, but that makes sense.

Speaker:

Considering we're only hitting one of the PoPs and then Europe and North

Speaker:

America tends to be about the same, but you can even drill down further,

Speaker:

one quick question.

Speaker:

I saw you hard-code the IP address of the end point in the Terraform.

Speaker:

how does Fastly is still know essentially to which PoP to route

Speaker:

and they're doing it through magic.

Speaker:

Um, you mean I'd put the IP the same IP address everywhere in /etc/hosts?.

Speaker:

Yep.

Speaker:

Yeah.

Speaker:

It's because of how they're doing traffic.

Speaker:

So it is the same IP address everywhere, but they the IP

Speaker:

address points to different things.

Speaker:

Basically.

Speaker:

It's cool.

Speaker:

A lot of CDNs do it that way.

Speaker:

so instead of different IP addresses, it's basically routing tricks.

Speaker:

We seem

Speaker:

to have maxed out.

Speaker:

Can you look at the

Speaker:

Yeah, this should be about it.

Speaker:

It should be all started at this point.

Speaker:

Yeah.

Speaker:

So we've launched a thousand users, we've entered Goose attack.

Speaker:

So we have evened out at 14.5 gigabits per second, which is, I think what we got

Speaker:

on one server with 10,000 users as well.

Speaker:

this is more, this is more than a single server single server.

Speaker:

I think we max out at nine gigabit.

Speaker:

Awesome.

Speaker:

Thank you guys.

Speaker:

All for joining us.

Speaker:

It was really cool to see that in action.

Speaker:

all the links we mentioned are going to be posted in the video summary and the

Speaker:

blog posts to that correlates with this.

Speaker:

Be sure to check out Tag1.com/goose that's tag the number one.com.

Speaker:

that's where we have all of our talks, documentation links to GitHub.

Speaker:

There's some really great blog posts there that will show you

Speaker:

step-by-step with the code, how to do everything that we covered today.

Speaker:

So be sure to check that out.

Speaker:

If you have any questions about Goose, please post them to the Goose issue queues

Speaker:

that we can share them with the community.

Speaker:

Of course, if you like this talk, please remember to upvote

Speaker:

subscribe and share it out.

Speaker:

You can check out our past Tag1 TeamTalks on a wide variety of topics from open

Speaker:

source and funding, getting funding on your open source projects to things like

Speaker:

decoupled systems and architectures for web applications at tag1.com/tag1teamtalks

Speaker:

as always we'd love your feedback and input on both this episode, as

Speaker:

well as ideas for future topics.

Speaker:

You can email us at ttt@tag1.com Again, a huge thank you, Jeremy, Fabian and

Speaker:

Narayan for walking us through this and to everyone who tuned in today.

Speaker:

Really appreciate you joining us.

Links

Chapters

Video

More from YouTube