Artwork for podcast Tag1 Team Talks | The Tag1 Consulting Podcast
How to Load Test with Goose - Part 1: Drupal 9 Umami on Pantheon with Fastly - Tag1 Team Talk
Episode 728th July 2021 • Tag1 Team Talks | The Tag1 Consulting Podcast • Tag1 Consulting, Inc.
00:00:00 00:30:00

Share Episode

Shownotes

Goose is the highly scalable load testing framework preferred by Tag1. In this series of Team Talks on Goose, we’ll take a look at how Goose scales on a single server, while distributed, and CEO Jeremy Andrews - the creator of Goose, VP of Software Engineering Fabian Franz, and Managing Director Michael Meyers walk through a demonstration of Goose load testing. 

In this team talk, our load test runs in AWS and we're testing against Drupal 9 with the default Unami install against a Tag1 Kubernetes cluster fronted by the Fastly CDN. These tests start with a limited set of user functions, and we’ll ramp up from 1,000 users up to 8,000 simultaneous users and show you how performance changes.

For more Goose content, see Goose Podcasts, Blogs, Presentations, & more!

Transcripts

Speaker:

Hello, and welcome to another Tag1 Team Talk, the podcast

Speaker:

and blog ofTag1 Consulting.

Speaker:

Today, we're going to be talking about how to load test with Goose.

Speaker:

It's going to be running in AWS and we're going to test against Drupal 9 With the

Speaker:

default Unami install against a Tag1 Kubernetes cluster fronted by the Fastly

Speaker:

CDN.

Speaker:

I'm Michael Meyers and managing director of Tag1 Consulting.

Speaker:

And I'm joined today by Jeremy Andrews, the Founder and CEO of Tag1, along with

Speaker:

Fabian Franz, our VP of Technology.

Speaker:

Let's jump right in.

Speaker:

I'm excited to see how Goose works.

Speaker:

Jeremy, welcome.

Speaker:

Fabian, welcome.

Speaker:

Thank you guys so much for walking us through this.

Speaker:

Thank you.

Speaker:

Yeah, I'm going to start off by sharing my screen.

Speaker:

So if you're watching this on a video, you'll be able to

Speaker:

see what we're doing as we go.

Speaker:

, to start off quickly looking at Goose itself Goose is written in Rust.

Speaker:

And if you go to Goose.rs, it'll take you to , the GitHub page.

Speaker:

and inside there you'll find a pretty normal looking Rust code base.

Speaker:

For today.

Speaker:

The part we care about is within the examples folder and we've written

Speaker:

several examples Primarily to demonstrate how Goose can be used.

Speaker:

the first one was just called simple and all that does, is it loads an

Speaker:

endpoint and shows you how to write the most simple of load tests.

Speaker:

It uses a few features like wait times.

Speaker:

So it'll load a page weight randomly between five and 15 seconds.

Speaker:

and then load it index page and about page and whatnot.

Speaker:

It does simple stuff.

Speaker:

It logs in.

Speaker:

then there's another version of that, where it was written to use

Speaker:

closures, a fancy feature of Rust, and Fabian wrote that version.

Speaker:

I find closures more difficult to understand, but they're

Speaker:

very, very flexible for programmatically doing things.

Speaker:

So it's, it's a super cool example to have in there.

Speaker:

So essentially what you're doing with the closure is don't even know what a

Speaker:

closure is, but what you can see is.

Speaker:

There's a pass array at the top or vector, how it's called in Rust.

Speaker:

And you can essentially just extend this vector, whatever pages you want.

Speaker:

You can draw that from a CSV, from wherever, and you can have a

Speaker:

programmatic load test, essentially.

Speaker:

I mean, even if you don't know too, too much about Rust, just, just looking

Speaker:

at these, you know, you can go to this examples, directory, cut and

Speaker:

paste, make some modifications and, you know, simple modifications and use

Speaker:

this to test your own Drupal website in a pretty sophisticated fashion.

Speaker:

This one is not as as helpful yet.

Speaker:

conceptually it helps you with with closures, but then there's

Speaker:

one called Drupal load test.

Speaker:

And that was literally written to load test every release of the memcache module.

Speaker:

Originally, we were doing it in JMeter and then we replaced it with Locust and

Speaker:

now we use Goose and every time it's a huge step forward, but this one has a

Speaker:

much more Drupal examples and much more.

Speaker:

You can cut and paste.

Speaker:

That said I believe it was Fabian who suggested we should

Speaker:

load test against Umami.

Speaker:

So if, if your goal is to write a load test for a Drupal 8 or Drupal

Speaker:

9 website, this is absolutely the, example to look at it's broken into

Speaker:

useful pieces, common.rs are functions that you can use, you just call

Speaker:

directly from, from your own load test.

Speaker:

And it, does a lot of things that aren't generic to Drupal websites.

Speaker:

Cause you can see here, it's defining a list of all nodes and whatnot in Umami.

Speaker:

but in any case, the, the load test we used when we've written a load

Speaker:

test for customers we've borrowed heavily on this code base because

Speaker:

it is a great starting place.

Speaker:

I actually love that you, you, you finished this Umami load

Speaker:

test because one of the things.

Speaker:

The Drupal community has been looking for since at least.

Speaker:

2015 is essentially a profile which we can, can do for performance,

Speaker:

regression, testing and core.

Speaker:

And this now would essentially allow that to just spin it up somewhere, load test

Speaker:

it, put some New Relic on it and then.

Speaker:

Essentially we could look at what's slow, what's fast, did we regress, et cetera.

Speaker:

So it's not only useful for memcache, but with Umami being finally complex

Speaker:

enough to resemble a real site I think we could get some meaningful data out of

Speaker:

it for future performance work in core.

Speaker:

So that's really cool.

Speaker:

It's funny you should mention that.

Speaker:

I spun this up to test earlier this week.

Speaker:

And I found that there are certain paths that are struggling under

Speaker:

the amount of load that we're putting against our little install.

Speaker:

And so there is some functionality that was being tested that I disabled on the

Speaker:

load test to not, to not load those paths.

Speaker:

for example I disabled searching , uh, the load test does searching in English and in

Speaker:

Spanish, and both are just disabled in our current test filling out the contact form.

Speaker:

And I also disable the user that logs in and basically just logs in edits a

Speaker:

article and saves it flushing the cache.

Speaker:

Those are the things it'd be great at some point.

Speaker:

And maybe on this series, we'll, we'll do that.

Speaker:

We could dig into why we had to disable them, how to optimize and fix it so

Speaker:

that we don't have to, and everything

Speaker:

can be tested.

Speaker:

Essentially what we are testing today and what we're seeing today, as we

Speaker:

are doing the big event scenario.

Speaker:

But essentially you get featured on Slashdot, on Reddit, on wherever, like,

Speaker:

and people are just crashing your site and all coming in, like, like angry Geese.

Speaker:

and and essentially they are all wanting to get the recipes from Umami

Speaker:

because they are so hot right now.

Speaker:

I mean, they're, they're perfect.

Speaker:

And because of that, and we cannot afford at this stage

Speaker:

essentially to allow people to do.

Speaker:

Things that are like really complex.

Speaker:

So what often happens in the real world on a scenario when a big day like Black

Speaker:

Friday comes in, is that you actually disable or rate limit functionality that

Speaker:

is very costly on the site like searching.

Speaker:

And you'll also tell your editors, Hey, please prepare all content before

Speaker:

this date and do as less content changes while it's going on, because

Speaker:

it always needs to clear some caches.

Speaker:

And obviously that can lead to.

Speaker:

A worse performance but that's pretty cool.

Speaker:

So, this is the big events scenario.

Speaker:

We have set up here,

Speaker:

so let's give it a run.

Speaker:

what we're going to see is, well, first of all, Goose is gonna

Speaker:

run from An instance in AWS.

Speaker:

we just spun it up here about 15 minutes ago.

Speaker:

we'll spin up some other instances as we test to demonstrate how different

Speaker:

hardware can perform different tests.

Speaker:

but a high-level summary Goose makes really, really good use

Speaker:

of available CPU power and.

Speaker:

Ultimately it consistently tends to bottleneck on bandwidth

Speaker:

because it can generate as much load as your uplink supports.

Speaker:

so that's, you know, that's a good thing that that's what we're trying to do

Speaker:

here and you can of course control it.

Speaker:

You can you don't always have to just flood things with

Speaker:

as much traffic as possible.

Speaker:

but for today, that's what we're most interested in is just seeing what we

Speaker:

can do, what traffic we can throw at it.

Speaker:

High level specs on this box.

Speaker:

Like how many cores is this thing?

Speaker:

this is an eight core server.

Speaker:

it's Amazon's CFA two X large, which should answer all your questions.

Speaker:

the two X large means it's got eight CPU's and the CF line

Speaker:

is pretty powerful with CPU.

Speaker:

I actually chose it.

Speaker:

This particular one, because eight cores is what I generally use

Speaker:

when I'm, when I'm load testing.

Speaker:

And I wanted something with a reasonably fast uplink and this one has, I think

Speaker:

it says up to, I think it was 10 gigabits or maybe it's five gigabits.

Speaker:

but it's enough to put some significant load and to show that, you know,

Speaker:

Goose is actually bottle-necking on uplink, not on, available.

Speaker:

CPU resources.

Speaker:

So the public IP, this is up and running.

Speaker:

so I already SSH into this And the, in the Goose directory on this server is

Speaker:

just the Git checkout of the current head of Goose the, the, the main branch.

Speaker:

And I'm going to see if I can quickly find in my history, the, commands that I run.

Speaker:

So we'll start off with just a simple thousand users.

Speaker:

this is an SSH into this AWS instance and we're going to just launch a

Speaker:

very small Goose load test a Goose attack against Umami website that

Speaker:

we're running to quickly talk through the, the flags, cargo run is because

Speaker:

instead of compiling a binary and then running the binary I'm using cargo to

Speaker:

manage that, which allows me to tweak the code and whatnot and recompile.

Speaker:

The dash dash release is important to optimize away debug symbols.

Speaker:

any load tests you're running, you're going to want to use dash dash release.

Speaker:

dash dash example is because we're just running an existing example

Speaker:

that we've provided the Umami one.

Speaker:

and then the dash dash space dash dash host is where we actually

Speaker:

start configuring Goose itself.

Speaker:

We're we're no longer in cargo, but we're in Goose.

Speaker:

we're we're load testing against umami dot tag one.io.

Speaker:

the dash V will give us a little bit of verbosity.

Speaker:

you don't want to go past.

Speaker:

One dash V, because then it gets too verbose.

Speaker:

It starts showing verbosity from all the libraries and whatnot, and

Speaker:

that, that will impact your results because it's a lot of overhead.

Speaker:

but one, one minus V is great.

Speaker:

Uh, dash dash log file basically takes everything in dash V and dumps it to a

Speaker:

log file , um, which we're defining as.

Speaker:

Goose.log.

Speaker:

And that's because of the dash G that follows, then we have a dash dash

Speaker:

debug file to a debug log because if something goes wrong Goose will capture

Speaker:

everything all the headers, everything that was requested and everything

Speaker:

that was returned and gives us a ton of insight into what's going wrong.

Speaker:

whether there's something's bottle-necking or there's something

Speaker:

wrong on the server or, or who knows.

Speaker:

but we've found that unbelievably helpful in actually troubleshooting

Speaker:

with , uh, with customers.

Speaker:

Dash T one M means that this is only gonna run for one minute and

Speaker:

then it's going to stop itself.

Speaker:

minus U 100 is we're gonna simulate 100 users and the minus r 3.4 means

Speaker:

that we're starting 3.4 users.

Speaker:

per second.

Speaker:

So we'll go ahead and give that a try and See what happens?

Speaker:

There's a couple of warnings.

Speaker:

I've commented some code.

Speaker:

So it starting is showing users launching in English and Spanish.

Speaker:

So now if we go to Fastly momentarily, we should start seeing some

Speaker:

traffic and we would expect it.

Speaker:

The server is actually located on the West coast of the United States.

Speaker:

So we'd expect to see traffic there.

Speaker:

It obviously saw something because it flushed it.

Speaker:

There we go.

Speaker:

So now it's seeing right here traffic coming from Seattle and We're currently

Speaker:

doing 140, 160 requests per second.

Speaker:

It's still ramping up.

Speaker:

down below you can see that it's ramping up to 95 megabits a second.

Speaker:

So there's traffic happening.

Speaker:

if we look on, on the server itself, we can see that Goose knows it's running.

Speaker:

it has successfully ramped everything up.

Speaker:

And so it's going to run for 60 seconds.

Speaker:

Just generating a little load 60 seconds.

Speaker:

is only enough for this demo.

Speaker:

but you can see the things generally stabilize and , uh, and you know,

Speaker:

it's pulling stuff out of the cache.

Speaker:

What's really important here.

Speaker:

This hit rate going up, this is the cache warming up and that's going to

Speaker:

be essential to, to, to scale up this test and put serious load against it.

Speaker:

You can see here that you know, we're capping out at about 350, 380 requests

Speaker:

per second with the a hundred users that we're simulating , um, there's

Speaker:

more requests than users, obviously, because there's a, it's not just re

Speaker:

requesting the index file, but it's also requesting elements from the page.

Speaker:

It's the CSS and the images and whatnot.

Speaker:

So all of that's loading the load test finished There's a delay in Fastly.

Speaker:

Uh, you saw it was a delay to start up.

Speaker:

It's also a delay to go down.

Speaker:

if, if we look at the results, you can see that everything ran.

Speaker:

It prints the final statistics there's, lots of statistics, there's flags.

Speaker:

You can use to change what it shows you.

Speaker:

it breaks it down per task, showing you that, you know, we've

Speaker:

been loading the front page.

Speaker:

A basic page, different articles, different recipes, different nodes by

Speaker:

node ID shows you how many times each was done, shows you the zero failures.

Speaker:

it was a total of 39.8 tasks per second.

Speaker:

and then you can also see how long each took so you can kind of get a

Speaker:

sense of which tasks are taking longer.

Speaker:

the recipe page is taking about a second to load, which is, which is fairly slow

Speaker:

for a webpage, especially coming out of a CDN any ideas on why it's that

Speaker:

slow, Fabian, would that be because of the amount of content on the page?

Speaker:

No, it's probably it might've been the first request, like on a cold cache?.

Speaker:

We've got a minimum and a max.

Speaker:

The minimum is still 972 milliseconds.

Speaker:

you've probably already exhausted the bandwidth of the Amazon instance.

Speaker:

Definitely not.

Speaker:

we'll we're gonna ramp this up considerably.

Speaker:

we can get you about 2000 users before we cap it out,

Speaker:

then it doesn't make sense.

Speaker:

So my thought process is that it's Hmm.

Speaker:

Okay.

Speaker:

Well, let's see per requests, so sorry.

Speaker:

So that's per task.

Speaker:

Okay.

Speaker:

Actually here it is what I was thinking because , per requests for

Speaker:

requesting slash it was let's see here.

Speaker:

Yeah, it's super, super fast.

Speaker:

absolute maximum of 148 milliseconds where it gets.

Speaker:

The, the reason it's slower is because making so many requests in those tasks.

Speaker:

And even though it's fast that's going to probably include sleeps and that's

Speaker:

going to include while it's giving up to let another user use the box.

Speaker:

so that's a combination of that user doing the entire page load

Speaker:

task and it would be slower.

Speaker:

The actual requests themselves are very, very fast, like we would expect anyway.

Speaker:

it's not very exciting.

Speaker:

But it gives you a quick demonstration of what this does.

Speaker:

So the general idea here then is, you know, now that now that we've seen what

Speaker:

it can do, let's Let's increase it.

Speaker:

So all we're going to do is we're going to go from a hundred users to a thousand

Speaker:

users, a 10 X speed up, and we're going to go instead of starting 3.4 a second, we're

Speaker:

going to go ahead and start 34 second.

Speaker:

important to note that you know, the, the resources it takes to start that up it's

Speaker:

still has it has no trouble with that.

Speaker:

That reminds me one thing that I should be doing here.

Speaker:

That's useful.

Speaker:

I'm, I'm exiting out just so that I can cut and paste this string and

Speaker:

connect in with a second window.

Speaker:

And the reason is on that second window, what's helpful is if we run a SAR command.

Speaker:

So like every 15 seconds, when you create a snapshot of what's going on

Speaker:

that'll give us a little more insight into what's going on in the box.

Speaker:

for that we're only going to be using.

Speaker:

The CPU stats.

Speaker:

but that was what I found was the most telling.

Speaker:

So we'll go in and do this again with a thousand starting at 34 seconds.

Speaker:

It would help if I was in the Goose directory.

Speaker:

All right.

Speaker:

The first thing it does is it builds a bunch of stuff in memory.

Speaker:

now that it's built, it's launching them.

Speaker:

if we switched to this view, we can see, we'll see that it starts using

Speaker:

more and more CPU as, as this spins up.

Speaker:

it only does a snapshot every 15 seconds and soon we should start

Speaker:

seeing more considerable traffic hitting the end point here.

Speaker:

All right.

Speaker:

There we go.

Speaker:

Traffic starting to hit.

Speaker:

It's going to ramp up considerably faster.

Speaker:

And what's one thing I noticed when I was testing this earlier, when

Speaker:

you throw 10 times as much, 10 times as many users at a server, Goose

Speaker:

throws 10 times as much traffic.

Speaker:

so it's, it's pretty simple math.

Speaker:

You're going to see 10 times or slightly more requests per second and you know,

Speaker:

10 times as much bandwidth happening.

Speaker:

Uh, what's really important for me here is that you are, that we can actually

Speaker:

see like a line going up for this because this line is so important for

Speaker:

me when I'm doing performance testing, it's what I call the scalability test.

Speaker:

Because, and we can do that in another session, if you combine this

Speaker:

test with New Relic, essentially, and we then do a scalability test.

Speaker:

There's a very slow ramp up.

Speaker:

You can actually see where a site starts to get keeping trouble, until

Speaker:

when it performs, because that is when your performance goes from linear.

Speaker:

Or even parallel , um, not impacted to exponential, and

Speaker:

that is when you get problems.

Speaker:

So that's a great way to do it.

Speaker:

That's why we have this ramp up variable because it's so, so

Speaker:

useful in testing performance.

Speaker:

So the load test just finished.

Speaker:

it's sustained 1.3, 1.4 gigabits per second.

Speaker:

it didn't, you know, those eight CPU's you know, they're 60% idle, 55, 60% idle.

Speaker:

So the server has way, way more, capacity to continue generating more load.

Speaker:

but the, the sheer amount of traffic is, is going to be our, our breakdown point.

Speaker:

Uh, when we try to keep ramping up the load test.

Speaker:

let's go ahead and quickly go back and look the per request metrics.

Speaker:

Um, you can see that at this point We were making 3,300 requests per second.

Speaker:

none of them failed The majority of them are static assets which, you know,

Speaker:

we're loading off all the pages that we're loading And then the actual page

Speaker:

load times continued to be quite fast.

Speaker:

an average of a little over eight milliseconds.

Speaker:

They stay this fast because they're coming out of Fastly.

Speaker:

They're not coming out of out of the end point.

Speaker:

But yeah the load test is continuing to look quite good.

Speaker:

So the next thing we'll do is we'll double the ramp up again to 64, and we'll

Speaker:

go from a thousand users to 2000 users.

Speaker:

and see what happens when we double the load.

Speaker:

As before it takes a moment while it's creating user states and it takes a little

Speaker:

longer because it's creating twice as many, it's an area I want to optimize.

Speaker:

and now it's ramping them up.

Speaker:

And again, the server has no problem.

Speaker:

even though we're doing twice as many per second again, the server can handle this.

Speaker:

and soon we'll start seeing traffic hitting on Fastly

Speaker:

and boom, there it goes.

Speaker:

there's a delay obviously for the reporting side of things, but once again,

Speaker:

you're going to see that nice ramp up and it's going to ramp up even further.

Speaker:

you can also see it's using more CPU this time as expected.

Speaker:

we're down to 13% idle and it's still ramping up.

Speaker:

But that's, that's great.

Speaker:

and then down here, sure enough, we're up to a 2.6 gigabits and still climbing.

Speaker:

so as noted, it's capable of doubling as we double how much we throw at it.

Speaker:

so Goose is more or less linearly scaling what's available to it.

Speaker:

which is great to see.

Speaker:

it's using about 90% of the CPU resources sustaining great load.

Speaker:

and of course, you know, what's, I guess what's also worth seeing is

Speaker:

that this website is continuing.

Speaker:

It should continue load because Fastly is actually taking most of the traffic it's

Speaker:

not actually hitting the back server.

Speaker:

So even though I'm logged in I can still use the website even under this much load.

Speaker:

With that all said and done.

Speaker:

See here, it did finish.

Speaker:

we were seeing up to 7,000 requests per second, which added up to, or

Speaker:

yes, and, and sure enough well, 6,600 requests per second is what what

Speaker:

Goose believed that it was doing.

Speaker:

Again, there were zero failures, which is good to see.

Speaker:

Fastly is doing a great job.

Speaker:

when it says zero failures, it's important to note that Goose is.

Speaker:

It, it's not guessing on that for every single request it makes for,

Speaker:

you know, 6,600 requests a second, it checks the status code to make

Speaker:

sure that it's what it expects.

Speaker:

So at 200, but then it's also analyzing the page that's returned.

Speaker:

And it's looking for key words, it's looking for, you know, the, whether

Speaker:

it's the title of the page or whether it's a specific search term and making

Speaker:

sure that really does exist on the page.

Speaker:

and it does it for 100% of the pages returned, not just a subset.

Speaker:

Rust is, incredibly efficient and a fantastic platform for

Speaker:

doing a load test like this.

Speaker:

do you have any questions at this point, Michael, or Fabian?

Speaker:

You said that it's doing

Speaker:

a hundred percent of the pages and checking.

Speaker:

Are you implying that other tools and systems

Speaker:

don't or?

Speaker:

Yeah.

Speaker:

What happens at scale is to be able to do to, to be able to scale up a

Speaker:

load to this size or even larger, like we're going to do you start having to

Speaker:

use performance tricks, where you look at a percentage of them and you have

Speaker:

a pretty reasonable assumption that things are working you know, check 25%.

Speaker:

If they all passed, you probably are okay.

Speaker:

but we don't have to do that.

Speaker:

And we're able to scale up.

Speaker:

I haven't found an uplink big enough that we can't handle that

Speaker:

with the available CPU power.

Speaker:

Essentially Michael, a people could also create this much load just by using AB

Speaker:

tools or HTP perf, et cetera, which, which is why people could say this is

Speaker:

nothing special, but what Jeremy is.

Speaker:

Telling us that this load test, not only is it programmed in Goose , but

Speaker:

it does a lot more than just sending traffic and also analyzes the

Speaker:

traffic and analyze all the pages.

Speaker:

And it's insane how efficient it is using the CPU.

Speaker:

Is like usually if you want to create this much traffic you would need several boxes.

Speaker:

Several AWS boxes to, to even do it, or do some parallelzation,

Speaker:

et cetera, with this other tools.

Speaker:

So it's really fantastic.

Speaker:

One Goose, one instance.

Speaker:

You're good to go.

Speaker:

No complexity.

Speaker:

That's great.

Speaker:

So you know, the logical thing to do is to double everything

Speaker:

again and see what happens.

Speaker:

At least that's how it seems logical to me.

Speaker:

So we'll go from a 2000 to 4,000 users and we'll create 128 a second.

Speaker:

I'm going to restart this SAR because I'm not sure it's going

Speaker:

to run long enough otherwise.

Speaker:

Um, yeah.

Speaker:

And the thing to see here is, again, it can.

Speaker:

Launch this many users this fast without problems.

Speaker:

Okay.

Speaker:

Yes.

Speaker:

So it can start them which is great news.

Speaker:

what we're going to find is that we're going to hit a different

Speaker:

bottleneck this time though.

Speaker:

and it's not the server CPU.

Speaker:

It's going to be our uplink.

Speaker:

We stalled for a little while, creating user states, getting

Speaker:

everything ready to go, and then boom, all of the user threads start.

Speaker:

there's not much to watch here.

Speaker:

It's just going to scroll through all the users it creates.

Speaker:

So I'll keep it open on our CPU chart to see how much CPU it's using.

Speaker:

if there's errors it'll become pretty apparent in various places.

Speaker:

So for now, we'll just keep an eye on, on on this website.

Speaker:

I love it.

Speaker:

I love it to, to just watch this in real time.

Speaker:

See the traffic coming in.

Speaker:

It's always my favorite part of when Tag1 is doing load tests for various clients.

Speaker:

And when we are doing this much load and then watching New Relic, like,

Speaker:

like going into insane numbers.

Speaker:

Watching Fastly, going to insane numbers.

Speaker:

It's like,

Speaker:

It's fun.

Speaker:

it's disappointing here.

Speaker:

We're not, we'll have the four 4k a request per second on, on Fastly.

Speaker:

So we seem to have capped out here at 6,600 requests a second.

Speaker:

And sure enough, interestingly, we, we have the same, even though

Speaker:

we're have twice as many users we're seeing the same amount of traffic.

Speaker:

One thing.

Speaker:

That's interesting.

Speaker:

If you compare this bandwidth chart to the previous bandwidth chart, it's flatter.

Speaker:

and so what's happening is the CPU has sufficient power.

Speaker:

It's able to make all of these requests but the way that async

Speaker:

works, they asynchronous requests.

Speaker:

Whenever something stalled, it just goes for another request.

Speaker:

And so it's able to flatten out how fast these requests are working.

Speaker:

so that load test finished nicely has a relatively flat line

Speaker:

because the CPU had enough power to, to keep generating requests.

Speaker:

even though the bandwidth was limited.

Speaker:

what that shows is that if, if we could start up a server with more power,

Speaker:

we could do more interesting things with it, like generate even more load.

Speaker:

So let's do exactly that.

Speaker:

I'm going to exit out of here.

Speaker:

I went ahead and started another instance.

Speaker:

this one is an R five in eight X large and the eight X large means it has.

Speaker:

32 cores.

Speaker:

and I believe it's supposed to have like a 25 gigabit uplink.

Speaker:

and that's great because we are needing more bandwidth.

Speaker:

So we'll connect into the server.

Speaker:

And as before we'll connect into it twice so that we can see how much,

Speaker:

how much we're stressing it out.

Speaker:

and there we go.

Speaker:

So inside the Goose directory we can spin up another load test.

Speaker:

now that we're not capped by bandwidth, we hope let's rerun our 4,000 user tests.

Speaker:

make sure there's no typos in this looks good.

Speaker:

Goose debug file one minute, 4,000 users.

Speaker:

All right.

Speaker:

So , um,

Speaker:

Yup.

Speaker:

It's gonna start allocating memory.

Speaker:

And that takes a bit of CPU power.

Speaker:

We will be able to see on Fastly once this traffic starts hitting.

Speaker:

And, and the goal here is that because we have more uplink that

Speaker:

we're able to generate more traffic.

Speaker:

the expectation absolutely is that that should work.

Speaker:

Let's see start hitting Fastly.

Speaker:

the server doesn't have to work very hard because again, it has 32 cores.

Speaker:

So even though it's generating quite a bit of users there's plenty of power there.

Speaker:

So that's not much of an issue now.

Speaker:

We've got the requests starting to come in.

Speaker:

And already you can see traffic on the Seattle PoP.

Speaker:

Uh, whoops.

Speaker:

There was a spike, which means there's errors and sure enough.

Speaker:

So what we have going on here is a perfect storm of too many

Speaker:

requests and the server's down.

Speaker:

So there's no point in running any further.

Speaker:

what's happened is this was able to ramp up so fast and it was able to

Speaker:

hit enough pages that were not yet in the cache that it caused some errors.

Speaker:

Um, It's worth taking a quick look and see here that we've got a, we've

Speaker:

got data in our, in our debug log.

Speaker:

It has 19 megs of , uh, of data there.

Speaker:

but if you look, you can get some sense of what's happening and sure

Speaker:

enough here, what you can see is that we're not getting anything back.

Speaker:

there's just the, the servers not responding.

Speaker:

Oh, And actually I'm wrong.

Speaker:

Okay.

Speaker:

It wasn't what I thought it was.

Speaker:

The specific problem here is DNS stopper.

Speaker:

We killed DNS locally.

Speaker:

the solution fortunately is quite easy.

Speaker:

if we ping umami.tag1.io and we grab this IP address, we can

Speaker:

add it to our host file.

Speaker:

And I did have it.

Speaker:

Oh, it is there.

Speaker:

Okay.

Speaker:

Next theory.

Speaker:

Too many open files.

Speaker:

That's our problem.

Speaker:

All right.

Speaker:

So the other issue is that you need to increase your, u limit, and I go crazy

Speaker:

large that we don't run into this.

Speaker:

I forgot we've spun up a new server.

Speaker:

So I had, I didn't, I didn't make that permanent, so I have to set the limit.

Speaker:

Now, if we run again, everything should work a lot better.

Speaker:

And luckily that's, that should be all we need to do to to be

Speaker:

able to run this test properly.

Speaker:

So the two things that I've I've ran into when I was testing, this was DNS

Speaker:

dying which is for me, the quickest solution was added as the host and not

Speaker:

having enough file handles, which is why we increase it with U limit with those,

Speaker:

it makes perfect sense.

Speaker:

if you if you want to load test it's can make sense to just.

Speaker:

Put in the IP address.

Speaker:

however one needs to be careful when doing that with a distributed load test

Speaker:

because then else you would be hitting the Fastly PoP of Seattle from China.

Speaker:

That's not what we want to do, obviously.

Speaker:

So yeah, but then we also don't want to load test DNS, so it

Speaker:

makes sense to cache it locally.

Speaker:

I was pinging from each server and then using the response to

Speaker:

the ping to set up the host.

Speaker:

So whatever resolves for that user, I just cache it in as the host, essentially.

Speaker:

and that works pretty well.

Speaker:

So now we have what we like to see.

Speaker:

We can see a proper ramp up happening and it's continuing

Speaker:

to ramp up fairly aggressively.

Speaker:

that's a lot of requests happening per second, that Goose is making

Speaker:

and every single one is validated.

Speaker:

and it's the real deal.

Speaker:

plus we can see in the global PoP traffic that, you know,

Speaker:

we we've hit a new, new level.

Speaker:

We went from where it was the topping out at 4k.

Speaker:

Now we're talking about 8 K so.

Speaker:

As before we were able to double it.

Speaker:

and sure enough bandwidth is up at 4.8 to five gigabytes of

Speaker:

traffic pretty consistently.

Speaker:

Can we go even higher?

Speaker:

Yes.

Speaker:

In fact we can , um, the thing to do, I, I just like doubling

Speaker:

and seeing how far we can get.

Speaker:

So let's double it again.

Speaker:

I'll go ahead and cancel this.

Speaker:

There's no need to let it run to completion at this point.

Speaker:

Um, I will let us do a clean shutdown.

Speaker:

actually, you know what, let's just force it to quit.

Speaker:

So it's quicker.

Speaker:

128, we'll go to 256 and we'll double 4,000 8,000.

Speaker:

So now we're gonna simulate 8,000 users actively loading

Speaker:

the page very aggressively and.

Speaker:

Yeah, Let's start that.

Speaker:

we'll see whether or not our uplink can handle 8,000 users.

Speaker:

go

Speaker:

ahead.

Speaker:

While, we're waiting for that.

Speaker:

Could you add like DNS caching to Goose itself like that?

Speaker:

it's this only one request

Speaker:

or.

Speaker:

Yes.

Speaker:

I mean, it might get complicated cause there's nothing to prevent your load

Speaker:

tests from hitting multiple domains.

Speaker:

But that's interesting.

Speaker:

We should look into that.

Speaker:

You want to file

Speaker:

Do you want to

Speaker:

take a

Speaker:

request live on stage.

Speaker:

Nice.

Speaker:

I liked the idea.

Speaker:

Uh, it'd be a good question for Narayan to whether or not he would like that

Speaker:

feature, but it seems great to me.

Speaker:

Whereas before we were using 50% of our our server to generate traffic,

Speaker:

we'll see how much, how much we end up using for this much traffic.

Speaker:

But our, our Fastly stats should start, start ramping up pretty quickly here

Speaker:

and there we go.

Speaker:

There are, we can see there's no errors because ramping up cleanly like that.

Speaker:

It's catching up.

Speaker:

we're already surpassing the original tests.

Speaker:

Now we're at eight K requests per second.

Speaker:

It's still going up.

Speaker:

12 K 16.

Speaker:

K sure enough.

Speaker:

We were able to double it again.

Speaker:

so we're doing 20 to 23,000 requests a second and sustaining 9.2 gigabytes.

Speaker:

nine gigabytes of traffic a second with a nice steady line, I should add, you

Speaker:

know, it's not, it's not spiky traffic.

Speaker:

It's very, very consistent traffic which is fantastic when you're trying to do a

Speaker:

load test and the, and the server, you know, it's, it's using a CPU's there's

Speaker:

only 4% idle but it's, it's generating some impressive some impressive load.

Speaker:

This was, this was really cool to see.

Speaker:

Thank you guys so much for walking us through this.

Speaker:

we're going to do some follow-ups to this.

Speaker:

we're going to do some how tos on testing with authenticated traffic.

Speaker:

We're going to show you how to spin up Gaggles and do.

Speaker:

Distributed load testing with Goose and Fabian.

Speaker:

I'm going to take you up on your scalability webinar, show

Speaker:

people how all that works.

Speaker:

so please stay tuned for more.

Speaker:

we'll put all of the links that Jeremy and Fabian mentioned in the show notes.

Speaker:

Do you have any questions for us?

Speaker:

Please reach out to us at tag1teamtalks@tag1.com.

Speaker:

We'd love your feedback on this, questions about Goose, or even better

Speaker:

hit us up in the issue queues on GitHub.

Speaker:

and we love feature suggestions.

Speaker:

You know, what, what should we cover in the future on this?

Speaker:

You can see our past Tag1 Team Talks at tag1.com/TTT for Tag1 Team Talks,

Speaker:

and you can go to Tag1.com/goose to get a lot of links and information

Speaker:

about Goose Jeremy, Fabian.

Speaker:

Again, a huge thank you for walking us through this today and to our listeners.

Speaker:

Thanks for tuning in for anotherTag1 Team talk..

Links

Chapters

Video

More from YouTube