How to Load Test with Goose - Part 1: Drupal 9 Umami on Pantheon with Fastly - Tag1 Team Talk - Tag1 Team Talks

Speaker: 00:00:00

Hello, and welcome to another Tag1 Team Talk, the podcast

Speaker: 00:00:03

and blog ofTag1 Consulting.

Speaker: 00:00:05

Today, we're going to be talking about how to load test with Goose.

Speaker: 00:00:09

It's going to be running in AWS and we're going to test against Drupal 9 With the

Speaker: 00:00:12

default Unami install against a Tag1 Kubernetes cluster fronted by the Fastly

Speaker: 00:00:19

CDN.

Speaker: 00:00:20

I'm Michael Meyers and managing director of Tag1 Consulting.

Speaker: 00:00:24

And I'm joined today by Jeremy Andrews, the Founder and CEO of Tag1, along with

Speaker: 00:00:29

Fabian Franz, our VP of Technology.

Speaker: 00:00:32

Let's jump right in.

Speaker: 00:00:33

I'm excited to see how Goose works.

Speaker: 00:00:36

Jeremy, welcome.

Speaker: 00:00:37

Fabian, welcome.

Speaker: 00:00:38

Thank you guys so much for walking us through this.

Speaker: 00:00:41

Thank you.

Speaker: 00:00:42

Yeah, I'm going to start off by sharing my screen.

Speaker: 00:00:46

So if you're watching this on a video, you'll be able to

Speaker: 00:00:48

see what we're doing as we go.

Speaker: 00:00:50

, to start off quickly looking at Goose itself Goose is written in Rust.

Speaker: 00:00:54

And if you go to Goose.rs, it'll take you to , the GitHub page.

Speaker: 00:00:59

and inside there you'll find a pretty normal looking Rust code base.

Speaker: 00:01:03

For today.

Speaker: 00:01:04

The part we care about is within the examples folder and we've written

Speaker: 00:01:07

several examples Primarily to demonstrate how Goose can be used.

Speaker: 00:01:12

the first one was just called simple and all that does, is it loads an

Speaker: 00:01:15

endpoint and shows you how to write the most simple of load tests.

Speaker: 00:01:19

It uses a few features like wait times.

Speaker: 00:01:21

So it'll load a page weight randomly between five and 15 seconds.

Speaker: 00:01:25

and then load it index page and about page and whatnot.

Speaker: 00:01:28

It does simple stuff.

Speaker: 00:01:29

It logs in.

Speaker: 00:01:30

then there's another version of that, where it was written to use

Speaker: 00:01:32

closures, a fancy feature of Rust, and Fabian wrote that version.

Speaker: 00:01:38

I find closures more difficult to understand, but they're

Speaker: 00:01:40

very, very flexible for programmatically doing things.

Speaker: 00:01:43

So it's, it's a super cool example to have in there.

Speaker: 00:01:46

So essentially what you're doing with the closure is don't even know what a

Speaker: 00:01:51

closure is, but what you can see is.

Speaker: 00:01:53

There's a pass array at the top or vector, how it's called in Rust.

Speaker: 00:01:57

And you can essentially just extend this vector, whatever pages you want.

Speaker: 00:02:02

You can draw that from a CSV, from wherever, and you can have a

Speaker: 00:02:06

programmatic load test, essentially.

Speaker: 00:02:10

I mean, even if you don't know too, too much about Rust, just, just looking

Speaker: 00:02:13

at these, you know, you can go to this examples, directory, cut and

Speaker: 00:02:16

paste, make some modifications and, you know, simple modifications and use

Speaker: 00:02:21

this to test your own Drupal website in a pretty sophisticated fashion.

Speaker: 00:02:25

This one is not as as helpful yet.

Speaker: 00:02:29

conceptually it helps you with with closures, but then there's

Speaker: 00:02:31

one called Drupal load test.

Speaker: 00:02:33

And that was literally written to load test every release of the memcache module.

Speaker: 00:02:38

Originally, we were doing it in JMeter and then we replaced it with Locust and

Speaker: 00:02:42

now we use Goose and every time it's a huge step forward, but this one has a

Speaker: 00:02:46

much more Drupal examples and much more.

Speaker: 00:02:49

You can cut and paste.

Speaker: 00:02:51

That said I believe it was Fabian who suggested we should

Speaker: 00:02:53

load test against Umami.

Speaker: 00:02:55

So if, if your goal is to write a load test for a Drupal 8 or Drupal

Speaker: 00:02:59

9 website, this is absolutely the, example to look at it's broken into

Speaker: 00:03:03

useful pieces, common.rs are functions that you can use, you just call

Speaker: 00:03:08

directly from, from your own load test.

Speaker: 00:03:10

And it, does a lot of things that aren't generic to Drupal websites.

Speaker: 00:03:13

Cause you can see here, it's defining a list of all nodes and whatnot in Umami.

Speaker: 00:03:16

but in any case, the, the load test we used when we've written a load

Speaker: 00:03:21

test for customers we've borrowed heavily on this code base because

Speaker: 00:03:24

it is a great starting place.

Speaker: 00:03:25

I actually love that you, you, you finished this Umami load

Speaker: 00:03:29

test because one of the things.

Speaker: 00:03:31

The Drupal community has been looking for since at least.

Speaker: 00:03:34

2015 is essentially a profile which we can, can do for performance,

Speaker: 00:03:40

regression, testing and core.

Speaker: 00:03:42

And this now would essentially allow that to just spin it up somewhere, load test

Speaker: 00:03:48

it, put some New Relic on it and then.

Speaker: 00:03:50

Essentially we could look at what's slow, what's fast, did we regress, et cetera.

Speaker: 00:03:55

So it's not only useful for memcache, but with Umami being finally complex

Speaker: 00:04:00

enough to resemble a real site I think we could get some meaningful data out of

Speaker: 00:04:04

it for future performance work in core.

Speaker: 00:04:07

So that's really cool.

Speaker: 00:04:09

It's funny you should mention that.

Speaker: 00:04:10

I spun this up to test earlier this week.

Speaker: 00:04:13

And I found that there are certain paths that are struggling under

Speaker: 00:04:18

the amount of load that we're putting against our little install.

Speaker: 00:04:22

And so there is some functionality that was being tested that I disabled on the

Speaker: 00:04:27

load test to not, to not load those paths.

Speaker: 00:04:30

for example I disabled searching , uh, the load test does searching in English and in

Speaker: 00:04:35

Spanish, and both are just disabled in our current test filling out the contact form.

Speaker: 00:04:39

And I also disable the user that logs in and basically just logs in edits a

Speaker: 00:04:45

article and saves it flushing the cache.

Speaker: 00:04:48

Those are the things it'd be great at some point.

Speaker: 00:04:50

And maybe on this series, we'll, we'll do that.

Speaker: 00:04:52

We could dig into why we had to disable them, how to optimize and fix it so

Speaker: 00:04:55

that we don't have to, and everything

Speaker: 00:04:57

can be tested.

Speaker: 00:04:59

Essentially what we are testing today and what we're seeing today, as we

Speaker: 00:05:02

are doing the big event scenario.

Speaker: 00:05:05

But essentially you get featured on Slashdot, on Reddit, on wherever, like,

Speaker: 00:05:10

and people are just crashing your site and all coming in, like, like angry Geese.

Speaker: 00:05:15

and and essentially they are all wanting to get the recipes from Umami

Speaker: 00:05:20

because they are so hot right now.

Speaker: 00:05:21

I mean, they're, they're perfect.

Speaker: 00:05:23

And because of that, and we cannot afford at this stage

Speaker: 00:05:27

essentially to allow people to do.

Speaker: 00:05:30

Things that are like really complex.

Speaker: 00:05:32

So what often happens in the real world on a scenario when a big day like Black

Speaker: 00:05:36

Friday comes in, is that you actually disable or rate limit functionality that

Speaker: 00:05:42

is very costly on the site like searching.

Speaker: 00:05:44

And you'll also tell your editors, Hey, please prepare all content before

Speaker: 00:05:48

this date and do as less content changes while it's going on, because

Speaker: 00:05:53

it always needs to clear some caches.

Speaker: 00:05:55

And obviously that can lead to.

Speaker: 00:05:57

A worse performance but that's pretty cool.

Speaker: 00:06:00

So, this is the big events scenario.

Speaker: 00:06:03

We have set up here,

Speaker: 00:06:06

so let's give it a run.

Speaker: 00:06:08

what we're going to see is, well, first of all, Goose is gonna

Speaker: 00:06:11

run from An instance in AWS.

Speaker: 00:06:13

we just spun it up here about 15 minutes ago.

Speaker: 00:06:17

we'll spin up some other instances as we test to demonstrate how different

Speaker: 00:06:20

hardware can perform different tests.

Speaker: 00:06:22

but a high-level summary Goose makes really, really good use

Speaker: 00:06:25

of available CPU power and.

Speaker: 00:06:29

Ultimately it consistently tends to bottleneck on bandwidth

Speaker: 00:06:32

because it can generate as much load as your uplink supports.

Speaker: 00:06:36

so that's, you know, that's a good thing that that's what we're trying to do

Speaker: 00:06:40

here and you can of course control it.

Speaker: 00:06:42

You can you don't always have to just flood things with

Speaker: 00:06:44

as much traffic as possible.

Speaker: 00:06:45

but for today, that's what we're most interested in is just seeing what we

Speaker: 00:06:48

can do, what traffic we can throw at it.

Speaker: 00:06:50

High level specs on this box.

Speaker: 00:06:52

Like how many cores is this thing?

Speaker: 00:06:54

this is an eight core server.

Speaker: 00:06:56

it's Amazon's CFA two X large, which should answer all your questions.

Speaker: 00:07:00

the two X large means it's got eight CPU's and the CF line

Speaker: 00:07:04

is pretty powerful with CPU.

Speaker: 00:07:05

I actually chose it.

Speaker: 00:07:07

This particular one, because eight cores is what I generally use

Speaker: 00:07:10

when I'm, when I'm load testing.

Speaker: 00:07:12

And I wanted something with a reasonably fast uplink and this one has, I think

Speaker: 00:07:17

it says up to, I think it was 10 gigabits or maybe it's five gigabits.

Speaker: 00:07:21

but it's enough to put some significant load and to show that, you know,

Speaker: 00:07:25

Goose is actually bottle-necking on uplink, not on, available.

Speaker: 00:07:28

CPU resources.

Speaker: 00:07:30

So the public IP, this is up and running.

Speaker: 00:07:32

so I already SSH into this And the, in the Goose directory on this server is

Speaker: 00:07:38

just the Git checkout of the current head of Goose the, the, the main branch.

Speaker: 00:07:43

And I'm going to see if I can quickly find in my history, the, commands that I run.

Speaker: 00:07:48

So we'll start off with just a simple thousand users.

Speaker: 00:07:51

this is an SSH into this AWS instance and we're going to just launch a

Speaker: 00:07:57

very small Goose load test a Goose attack against Umami website that

Speaker: 00:08:02

we're running to quickly talk through the, the flags, cargo run is because

Speaker: 00:08:07

instead of compiling a binary and then running the binary I'm using cargo to

Speaker: 00:08:10

manage that, which allows me to tweak the code and whatnot and recompile.

Speaker: 00:08:15

The dash dash release is important to optimize away debug symbols.

Speaker: 00:08:19

any load tests you're running, you're going to want to use dash dash release.

Speaker: 00:08:22

dash dash example is because we're just running an existing example

Speaker: 00:08:27

that we've provided the Umami one.

Speaker: 00:08:28

and then the dash dash space dash dash host is where we actually

Speaker: 00:08:31

start configuring Goose itself.

Speaker: 00:08:33

We're we're no longer in cargo, but we're in Goose.

Speaker: 00:08:35

we're we're load testing against umami dot tag one.io.

Speaker: 00:08:38

the dash V will give us a little bit of verbosity.

Speaker: 00:08:41

you don't want to go past.

Speaker: 00:08:42

One dash V, because then it gets too verbose.

Speaker: 00:08:45

It starts showing verbosity from all the libraries and whatnot, and

Speaker: 00:08:49

that, that will impact your results because it's a lot of overhead.

Speaker: 00:08:51

but one, one minus V is great.

Speaker: 00:08:53

Uh, dash dash log file basically takes everything in dash V and dumps it to a

Speaker: 00:08:57

log file , um, which we're defining as.

Speaker: 00:08:59

Goose.log.

Speaker: 00:09:00

And that's because of the dash G that follows, then we have a dash dash

Speaker: 00:09:04

debug file to a debug log because if something goes wrong Goose will capture

Speaker: 00:09:09

everything all the headers, everything that was requested and everything

Speaker: 00:09:11

that was returned and gives us a ton of insight into what's going wrong.

Speaker: 00:09:16

whether there's something's bottle-necking or there's something

Speaker: 00:09:17

wrong on the server or, or who knows.

Speaker: 00:09:19

but we've found that unbelievably helpful in actually troubleshooting

Speaker: 00:09:23

with , uh, with customers.

Speaker: 00:09:25

Dash T one M means that this is only gonna run for one minute and

Speaker: 00:09:28

then it's going to stop itself.

Speaker: 00:09:29

minus U 100 is we're gonna simulate 100 users and the minus r 3.4 means

Speaker: 00:09:36

that we're starting 3.4 users.

Speaker: 00:09:38

per second.

Speaker: 00:09:39

So we'll go ahead and give that a try and See what happens?

Speaker: 00:09:43

There's a couple of warnings.

Speaker: 00:09:44

I've commented some code.

Speaker: 00:09:45

So it starting is showing users launching in English and Spanish.

Speaker: 00:09:49

So now if we go to Fastly momentarily, we should start seeing some

Speaker: 00:09:53

traffic and we would expect it.

Speaker: 00:09:56

The server is actually located on the West coast of the United States.

Speaker: 00:09:58

So we'd expect to see traffic there.

Speaker: 00:10:00

It obviously saw something because it flushed it.

Speaker: 00:10:03

There we go.

Speaker: 00:10:04

So now it's seeing right here traffic coming from Seattle and We're currently

Speaker: 00:10:09

doing 140, 160 requests per second.

Speaker: 00:10:12

It's still ramping up.

Speaker: 00:10:13

down below you can see that it's ramping up to 95 megabits a second.

Speaker: 00:10:17

So there's traffic happening.

Speaker: 00:10:19

if we look on, on the server itself, we can see that Goose knows it's running.

Speaker: 00:10:23

it has successfully ramped everything up.

Speaker: 00:10:25

And so it's going to run for 60 seconds.

Speaker: 00:10:27

Just generating a little load 60 seconds.

Speaker: 00:10:30

is only enough for this demo.

Speaker: 00:10:31

but you can see the things generally stabilize and , uh, and you know,

Speaker: 00:10:36

it's pulling stuff out of the cache.

Speaker: 00:10:38

What's really important here.

Speaker: 00:10:39

This hit rate going up, this is the cache warming up and that's going to

Speaker: 00:10:42

be essential to, to, to scale up this test and put serious load against it.

Speaker: 00:10:47

You can see here that you know, we're capping out at about 350, 380 requests

Speaker: 00:10:51

per second with the a hundred users that we're simulating , um, there's

Speaker: 00:10:55

more requests than users, obviously, because there's a, it's not just re

Speaker: 00:10:59

requesting the index file, but it's also requesting elements from the page.

Speaker: 00:11:03

It's the CSS and the images and whatnot.

Speaker: 00:11:05

So all of that's loading the load test finished There's a delay in Fastly.

Speaker: 00:11:09

Uh, you saw it was a delay to start up.

Speaker: 00:11:10

It's also a delay to go down.

Speaker: 00:11:12

if, if we look at the results, you can see that everything ran.

Speaker: 00:11:15

It prints the final statistics there's, lots of statistics, there's flags.

Speaker: 00:11:19

You can use to change what it shows you.

Speaker: 00:11:21

it breaks it down per task, showing you that, you know, we've

Speaker: 00:11:23

been loading the front page.

Speaker: 00:11:25

A basic page, different articles, different recipes, different nodes by

Speaker: 00:11:29

node ID shows you how many times each was done, shows you the zero failures.

Speaker: 00:11:34

it was a total of 39.8 tasks per second.

Speaker: 00:11:37

and then you can also see how long each took so you can kind of get a

Speaker: 00:11:40

sense of which tasks are taking longer.

Speaker: 00:11:42

the recipe page is taking about a second to load, which is, which is fairly slow

Speaker: 00:11:45

for a webpage, especially coming out of a CDN any ideas on why it's that

Speaker: 00:11:48

slow, Fabian, would that be because of the amount of content on the page?

Speaker: 00:11:53

No, it's probably it might've been the first request, like on a cold cache?.

Speaker: 00:11:57

We've got a minimum and a max.

Speaker: 00:12:00

The minimum is still 972 milliseconds.

Speaker: 00:12:03

you've probably already exhausted the bandwidth of the Amazon instance.

Speaker: 00:12:07

Definitely not.

Speaker: 00:12:08

we'll we're gonna ramp this up considerably.

Speaker: 00:12:10

we can get you about 2000 users before we cap it out,

Speaker: 00:12:13

then it doesn't make sense.

Speaker: 00:12:18

So my thought process is that it's Hmm.

Speaker: 00:12:21

Okay.

Speaker: 00:12:22

Well, let's see per requests, so sorry.

Speaker: 00:12:24

So that's per task.

Speaker: 00:12:25

Okay.

Speaker: 00:12:25

Actually here it is what I was thinking because , per requests for

Speaker: 00:12:29

requesting slash it was let's see here.

Speaker: 00:12:33

Yeah, it's super, super fast.

Speaker: 00:12:34

absolute maximum of 148 milliseconds where it gets.

Speaker: 00:12:38

The, the reason it's slower is because making so many requests in those tasks.

Speaker: 00:12:42

And even though it's fast that's going to probably include sleeps and that's

Speaker: 00:12:45

going to include while it's giving up to let another user use the box.

Speaker: 00:12:49

so that's a combination of that user doing the entire page load

Speaker: 00:12:53

task and it would be slower.

Speaker: 00:12:55

The actual requests themselves are very, very fast, like we would expect anyway.

Speaker: 00:13:00

it's not very exciting.

Speaker: 00:13:01

But it gives you a quick demonstration of what this does.

Speaker: 00:13:04

So the general idea here then is, you know, now that now that we've seen what

Speaker: 00:13:08

it can do, let's Let's increase it.

Speaker: 00:13:11

So all we're going to do is we're going to go from a hundred users to a thousand

Speaker: 00:13:13

users, a 10 X speed up, and we're going to go instead of starting 3.4 a second, we're

Speaker: 00:13:19

going to go ahead and start 34 second.

Speaker: 00:13:20

important to note that you know, the, the resources it takes to start that up it's

Speaker: 00:13:24

still has it has no trouble with that.

Speaker: 00:13:26

That reminds me one thing that I should be doing here.

Speaker: 00:13:29

That's useful.

Speaker: 00:13:30

I'm, I'm exiting out just so that I can cut and paste this string and

Speaker: 00:13:33

connect in with a second window.

Speaker: 00:13:35

And the reason is on that second window, what's helpful is if we run a SAR command.

Speaker: 00:13:39

So like every 15 seconds, when you create a snapshot of what's going on

Speaker: 00:13:43

that'll give us a little more insight into what's going on in the box.

Speaker: 00:13:45

for that we're only going to be using.

Speaker: 00:13:47

The CPU stats.

Speaker: 00:13:49

but that was what I found was the most telling.

Speaker: 00:13:52

So we'll go in and do this again with a thousand starting at 34 seconds.

Speaker: 00:13:56

It would help if I was in the Goose directory.

Speaker: 00:14:00

All right.

Speaker: 00:14:01

The first thing it does is it builds a bunch of stuff in memory.

Speaker: 00:14:03

now that it's built, it's launching them.

Speaker: 00:14:05

if we switched to this view, we can see, we'll see that it starts using

Speaker: 00:14:08

more and more CPU as, as this spins up.

Speaker: 00:14:12

it only does a snapshot every 15 seconds and soon we should start

Speaker: 00:14:16

seeing more considerable traffic hitting the end point here.

Speaker: 00:14:22

All right.

Speaker: 00:14:22

There we go.

Speaker: 00:14:22

Traffic starting to hit.

Speaker: 00:14:24

It's going to ramp up considerably faster.

Speaker: 00:14:26

And what's one thing I noticed when I was testing this earlier, when

Speaker: 00:14:30

you throw 10 times as much, 10 times as many users at a server, Goose

Speaker: 00:14:34

throws 10 times as much traffic.

Speaker: 00:14:36

so it's, it's pretty simple math.

Speaker: 00:14:38

You're going to see 10 times or slightly more requests per second and you know,

Speaker: 00:14:42

10 times as much bandwidth happening.

Speaker: 00:14:45

Uh, what's really important for me here is that you are, that we can actually

Speaker: 00:14:48

see like a line going up for this because this line is so important for

Speaker: 00:14:52

me when I'm doing performance testing, it's what I call the scalability test.

Speaker: 00:14:57

Because, and we can do that in another session, if you combine this

Speaker: 00:15:01

test with New Relic, essentially, and we then do a scalability test.

Speaker: 00:15:06

There's a very slow ramp up.

Speaker: 00:15:07

You can actually see where a site starts to get keeping trouble, until

Speaker: 00:15:12

when it performs, because that is when your performance goes from linear.

Speaker: 00:15:17

Or even parallel , um, not impacted to exponential, and

Speaker: 00:15:21

that is when you get problems.

Speaker: 00:15:23

So that's a great way to do it.

Speaker: 00:15:24

That's why we have this ramp up variable because it's so, so

Speaker: 00:15:27

useful in testing performance.

Speaker: 00:15:31

So the load test just finished.

Speaker: 00:15:32

it's sustained 1.3, 1.4 gigabits per second.

Speaker: 00:15:35

it didn't, you know, those eight CPU's you know, they're 60% idle, 55, 60% idle.

Speaker: 00:15:41

So the server has way, way more, capacity to continue generating more load.

Speaker: 00:15:46

but the, the sheer amount of traffic is, is going to be our, our breakdown point.

Speaker: 00:15:51

Uh, when we try to keep ramping up the load test.

Speaker: 00:15:54

let's go ahead and quickly go back and look the per request metrics.

Speaker: 00:15:57

Um, you can see that at this point We were making 3,300 requests per second.

Speaker: 00:16:01

none of them failed The majority of them are static assets which, you know,

Speaker: 00:16:05

we're loading off all the pages that we're loading And then the actual page

Speaker: 00:16:08

load times continued to be quite fast.

Speaker: 00:16:11

an average of a little over eight milliseconds.

Speaker: 00:16:13

They stay this fast because they're coming out of Fastly.

Speaker: 00:16:16

They're not coming out of out of the end point.

Speaker: 00:16:18

But yeah the load test is continuing to look quite good.

Speaker: 00:16:20

So the next thing we'll do is we'll double the ramp up again to 64, and we'll

Speaker: 00:16:25

go from a thousand users to 2000 users.

Speaker: 00:16:27

and see what happens when we double the load.

Speaker: 00:16:33

As before it takes a moment while it's creating user states and it takes a little

Speaker: 00:16:37

longer because it's creating twice as many, it's an area I want to optimize.

Speaker: 00:16:42

and now it's ramping them up.

Speaker: 00:16:43

And again, the server has no problem.

Speaker: 00:16:45

even though we're doing twice as many per second again, the server can handle this.

Speaker: 00:16:50

and soon we'll start seeing traffic hitting on Fastly

Speaker: 00:16:59

and boom, there it goes.

Speaker: 00:17:00

there's a delay obviously for the reporting side of things, but once again,

Speaker: 00:17:04

you're going to see that nice ramp up and it's going to ramp up even further.

Speaker: 00:17:07

you can also see it's using more CPU this time as expected.

Speaker: 00:17:10

we're down to 13% idle and it's still ramping up.

Speaker: 00:17:13

But that's, that's great.

Speaker: 00:17:14

and then down here, sure enough, we're up to a 2.6 gigabits and still climbing.

Speaker: 00:17:18

so as noted, it's capable of doubling as we double how much we throw at it.

Speaker: 00:17:23

so Goose is more or less linearly scaling what's available to it.

Speaker: 00:17:26

which is great to see.

Speaker: 00:17:27

it's using about 90% of the CPU resources sustaining great load.

Speaker: 00:17:31

and of course, you know, what's, I guess what's also worth seeing is

Speaker: 00:17:34

that this website is continuing.

Speaker: 00:17:36

It should continue load because Fastly is actually taking most of the traffic it's

Speaker: 00:17:41

not actually hitting the back server.

Speaker: 00:17:42

So even though I'm logged in I can still use the website even under this much load.

Speaker: 00:17:46

With that all said and done.

Speaker: 00:17:49

See here, it did finish.

Speaker: 00:17:51

we were seeing up to 7,000 requests per second, which added up to, or

Speaker: 00:17:56

yes, and, and sure enough well, 6,600 requests per second is what what

Speaker: 00:18:00

Goose believed that it was doing.

Speaker: 00:18:02

Again, there were zero failures, which is good to see.

Speaker: 00:18:04

Fastly is doing a great job.

Speaker: 00:18:06

when it says zero failures, it's important to note that Goose is.

Speaker: 00:18:10

It, it's not guessing on that for every single request it makes for,

Speaker: 00:18:13

you know, 6,600 requests a second, it checks the status code to make

Speaker: 00:18:17

sure that it's what it expects.

Speaker: 00:18:19

So at 200, but then it's also analyzing the page that's returned.

Speaker: 00:18:22

And it's looking for key words, it's looking for, you know, the, whether

Speaker: 00:18:25

it's the title of the page or whether it's a specific search term and making

Speaker: 00:18:29

sure that really does exist on the page.

Speaker: 00:18:31

and it does it for 100% of the pages returned, not just a subset.

Speaker: 00:18:35

Rust is, incredibly efficient and a fantastic platform for

Speaker: 00:18:38

doing a load test like this.

Speaker: 00:18:40

do you have any questions at this point, Michael, or Fabian?

Speaker: 00:18:44

You said that it's doing

Speaker: 00:18:44

a hundred percent of the pages and checking.

Speaker: 00:18:46

Are you implying that other tools and systems

Speaker: 00:18:49

don't or?

Speaker: 00:18:51

Yeah.

Speaker: 00:18:51

What happens at scale is to be able to do to, to be able to scale up a

Speaker: 00:18:56

load to this size or even larger, like we're going to do you start having to

Speaker: 00:18:59

use performance tricks, where you look at a percentage of them and you have

Speaker: 00:19:02

a pretty reasonable assumption that things are working you know, check 25%.

Speaker: 00:19:07

If they all passed, you probably are okay.

Speaker: 00:19:08

but we don't have to do that.

Speaker: 00:19:09

And we're able to scale up.

Speaker: 00:19:11

I haven't found an uplink big enough that we can't handle that

Speaker: 00:19:14

with the available CPU power.

Speaker: 00:19:16

Essentially Michael, a people could also create this much load just by using AB

Speaker: 00:19:21

tools or HTP perf, et cetera, which, which is why people could say this is

Speaker: 00:19:28

nothing special, but what Jeremy is.

Speaker: 00:19:30

Telling us that this load test, not only is it programmed in Goose , but

Speaker: 00:19:33

it does a lot more than just sending traffic and also analyzes the

Speaker: 00:19:37

traffic and analyze all the pages.

Speaker: 00:19:40

And it's insane how efficient it is using the CPU.

Speaker: 00:19:43

Is like usually if you want to create this much traffic you would need several boxes.

Speaker: 00:19:48

Several AWS boxes to, to even do it, or do some parallelzation,

Speaker: 00:19:53

et cetera, with this other tools.

Speaker: 00:19:55

So it's really fantastic.

Speaker: 00:19:57

One Goose, one instance.

Speaker: 00:19:58

You're good to go.

Speaker: 00:19:59

No complexity.

Speaker: 00:20:00

That's great.

Speaker: 00:20:04

So you know, the logical thing to do is to double everything

Speaker: 00:20:07

again and see what happens.

Speaker: 00:20:09

At least that's how it seems logical to me.

Speaker: 00:20:11

So we'll go from a 2000 to 4,000 users and we'll create 128 a second.

Speaker: 00:20:18

I'm going to restart this SAR because I'm not sure it's going

Speaker: 00:20:21

to run long enough otherwise.

Speaker: 00:20:22

Um, yeah.

Speaker: 00:20:23

And the thing to see here is, again, it can.

Speaker: 00:20:27

Launch this many users this fast without problems.

Speaker: 00:20:30

Okay.

Speaker: 00:20:31

Yes.

Speaker: 00:20:32

So it can start them which is great news.

Speaker: 00:20:34

what we're going to find is that we're going to hit a different

Speaker: 00:20:37

bottleneck this time though.

Speaker: 00:20:38

and it's not the server CPU.

Speaker: 00:20:40

It's going to be our uplink.

Speaker: 00:20:41

We stalled for a little while, creating user states, getting

Speaker: 00:20:44

everything ready to go, and then boom, all of the user threads start.

Speaker: 00:20:48

there's not much to watch here.

Speaker: 00:20:49

It's just going to scroll through all the users it creates.

Speaker: 00:20:52

So I'll keep it open on our CPU chart to see how much CPU it's using.

Speaker: 00:20:55

if there's errors it'll become pretty apparent in various places.

Speaker: 00:20:59

So for now, we'll just keep an eye on, on on this website.

Speaker: 00:21:05

I love it.

Speaker: 00:21:06

I love it to, to just watch this in real time.

Speaker: 00:21:09

See the traffic coming in.

Speaker: 00:21:10

It's always my favorite part of when Tag1 is doing load tests for various clients.

Speaker: 00:21:16

And when we are doing this much load and then watching New Relic, like,

Speaker: 00:21:19

like going into insane numbers.

Speaker: 00:21:21

Watching Fastly, going to insane numbers.

Speaker: 00:21:23

It's like,

Speaker: 00:21:26

It's fun.

Speaker: 00:21:26

it's disappointing here.

Speaker: 00:21:27

We're not, we'll have the four 4k a request per second on, on Fastly.

Speaker: 00:21:32

So we seem to have capped out here at 6,600 requests a second.

Speaker: 00:21:37

And sure enough, interestingly, we, we have the same, even though

Speaker: 00:21:42

we're have twice as many users we're seeing the same amount of traffic.

Speaker: 00:21:46

One thing.

Speaker: 00:21:46

That's interesting.

Speaker: 00:21:47

If you compare this bandwidth chart to the previous bandwidth chart, it's flatter.

Speaker: 00:21:51

and so what's happening is the CPU has sufficient power.

Speaker: 00:21:55

It's able to make all of these requests but the way that async

Speaker: 00:21:58

works, they asynchronous requests.

Speaker: 00:22:00

Whenever something stalled, it just goes for another request.

Speaker: 00:22:02

And so it's able to flatten out how fast these requests are working.

Speaker: 00:22:06

so that load test finished nicely has a relatively flat line

Speaker: 00:22:09

because the CPU had enough power to, to keep generating requests.

Speaker: 00:22:14

even though the bandwidth was limited.

Speaker: 00:22:15

what that shows is that if, if we could start up a server with more power,

Speaker: 00:22:20

we could do more interesting things with it, like generate even more load.

Speaker: 00:22:23

So let's do exactly that.

Speaker: 00:22:25

I'm going to exit out of here.

Speaker: 00:22:29

I went ahead and started another instance.

Speaker: 00:22:30

this one is an R five in eight X large and the eight X large means it has.

Speaker: 00:22:36

32 cores.

Speaker: 00:22:38

and I believe it's supposed to have like a 25 gigabit uplink.

Speaker: 00:22:41

and that's great because we are needing more bandwidth.

Speaker: 00:22:46

So we'll connect into the server.

Speaker: 00:22:50

And as before we'll connect into it twice so that we can see how much,

Speaker: 00:22:56

how much we're stressing it out.

Speaker: 00:22:57

and there we go.

Speaker: 00:22:58

So inside the Goose directory we can spin up another load test.

Speaker: 00:23:02

now that we're not capped by bandwidth, we hope let's rerun our 4,000 user tests.

Speaker: 00:23:07

make sure there's no typos in this looks good.

Speaker: 00:23:09

Goose debug file one minute, 4,000 users.

Speaker: 00:23:12

All right.

Speaker: 00:23:13

So , um,

Speaker: 00:23:15

Yup.

Speaker: 00:23:16

It's gonna start allocating memory.

Speaker: 00:23:19

And that takes a bit of CPU power.

Speaker: 00:23:22

We will be able to see on Fastly once this traffic starts hitting.

Speaker: 00:23:27

And, and the goal here is that because we have more uplink that

Speaker: 00:23:31

we're able to generate more traffic.

Speaker: 00:23:32

the expectation absolutely is that that should work.

Speaker: 00:23:35

Let's see start hitting Fastly.

Speaker: 00:23:36

the server doesn't have to work very hard because again, it has 32 cores.

Speaker: 00:23:40

So even though it's generating quite a bit of users there's plenty of power there.

Speaker: 00:23:43

So that's not much of an issue now.

Speaker: 00:23:46

We've got the requests starting to come in.

Speaker: 00:23:52

And already you can see traffic on the Seattle PoP.

Speaker: 00:23:56

Uh, whoops.

Speaker: 00:23:58

There was a spike, which means there's errors and sure enough.

Speaker: 00:24:01

So what we have going on here is a perfect storm of too many

Speaker: 00:24:05

requests and the server's down.

Speaker: 00:24:07

So there's no point in running any further.

Speaker: 00:24:09

what's happened is this was able to ramp up so fast and it was able to

Speaker: 00:24:13

hit enough pages that were not yet in the cache that it caused some errors.

Speaker: 00:24:17

Um, It's worth taking a quick look and see here that we've got a, we've

Speaker: 00:24:21

got data in our, in our debug log.

Speaker: 00:24:23

It has 19 megs of , uh, of data there.

Speaker: 00:24:25

but if you look, you can get some sense of what's happening and sure

Speaker: 00:24:28

enough here, what you can see is that we're not getting anything back.

Speaker: 00:24:31

there's just the, the servers not responding.

Speaker: 00:24:33

Oh, And actually I'm wrong.

Speaker: 00:24:35

Okay.

Speaker: 00:24:35

It wasn't what I thought it was.

Speaker: 00:24:37

The specific problem here is DNS stopper.

Speaker: 00:24:39

We killed DNS locally.

Speaker: 00:24:41

the solution fortunately is quite easy.

Speaker: 00:24:43

if we ping umami.tag1.io and we grab this IP address, we can

Speaker: 00:24:49

add it to our host file.

Speaker: 00:24:51

And I did have it.

Speaker: 00:24:52

Oh, it is there.

Speaker: 00:24:55

Okay.

Speaker: 00:24:56

Next theory.

Speaker: 00:25:00

Too many open files.

Speaker: 00:25:01

That's our problem.

Speaker: 00:25:02

All right.

Speaker: 00:25:03

So the other issue is that you need to increase your, u limit, and I go crazy

Speaker: 00:25:08

large that we don't run into this.

Speaker: 00:25:10

I forgot we've spun up a new server.

Speaker: 00:25:11

So I had, I didn't, I didn't make that permanent, so I have to set the limit.

Speaker: 00:25:15

Now, if we run again, everything should work a lot better.

Speaker: 00:25:18

And luckily that's, that should be all we need to do to to be

Speaker: 00:25:21

able to run this test properly.

Speaker: 00:25:24

So the two things that I've I've ran into when I was testing, this was DNS

Speaker: 00:25:28

dying which is for me, the quickest solution was added as the host and not

Speaker: 00:25:32

having enough file handles, which is why we increase it with U limit with those,

Speaker: 00:25:36

it makes perfect sense.

Speaker: 00:25:37

if you if you want to load test it's can make sense to just.

Speaker: 00:25:41

Put in the IP address.

Speaker: 00:25:43

however one needs to be careful when doing that with a distributed load test

Speaker: 00:25:48

because then else you would be hitting the Fastly PoP of Seattle from China.

Speaker: 00:25:54

That's not what we want to do, obviously.

Speaker: 00:25:57

So yeah, but then we also don't want to load test DNS, so it

Speaker: 00:26:01

makes sense to cache it locally.

Speaker: 00:26:03

I was pinging from each server and then using the response to

Speaker: 00:26:06

the ping to set up the host.

Speaker: 00:26:07

So whatever resolves for that user, I just cache it in as the host, essentially.

Speaker: 00:26:11

and that works pretty well.

Speaker: 00:26:12

So now we have what we like to see.

Speaker: 00:26:13

We can see a proper ramp up happening and it's continuing

Speaker: 00:26:17

to ramp up fairly aggressively.

Speaker: 00:26:18

that's a lot of requests happening per second, that Goose is making

Speaker: 00:26:21

and every single one is validated.

Speaker: 00:26:23

and it's the real deal.

Speaker: 00:26:24

plus we can see in the global PoP traffic that, you know,

Speaker: 00:26:27

we we've hit a new, new level.

Speaker: 00:26:29

We went from where it was the topping out at 4k.

Speaker: 00:26:31

Now we're talking about 8 K so.

Speaker: 00:26:33

As before we were able to double it.

Speaker: 00:26:35

and sure enough bandwidth is up at 4.8 to five gigabytes of

Speaker: 00:26:38

traffic pretty consistently.

Speaker: 00:26:41

Can we go even higher?

Speaker: 00:26:43

Yes.

Speaker: 00:26:44

In fact we can , um, the thing to do, I, I just like doubling

Speaker: 00:26:47

and seeing how far we can get.

Speaker: 00:26:49

So let's double it again.

Speaker: 00:26:50

I'll go ahead and cancel this.

Speaker: 00:26:51

There's no need to let it run to completion at this point.

Speaker: 00:26:54

Um, I will let us do a clean shutdown.

Speaker: 00:26:57

actually, you know what, let's just force it to quit.

Speaker: 00:26:58

So it's quicker.

Speaker: 00:26:59

128, we'll go to 256 and we'll double 4,000 8,000.

Speaker: 00:27:06

So now we're gonna simulate 8,000 users actively loading

Speaker: 00:27:09

the page very aggressively and.

Speaker: 00:27:13

Yeah, Let's start that.

Speaker: 00:27:14

we'll see whether or not our uplink can handle 8,000 users.

Speaker: 00:27:18

go

Speaker: 00:27:18

ahead.

Speaker: 00:27:19

While, we're waiting for that.

Speaker: 00:27:20

Could you add like DNS caching to Goose itself like that?

Speaker: 00:27:25

it's this only one request

Speaker: 00:27:29

or.

Speaker: 00:27:31

Yes.

Speaker: 00:27:32

I mean, it might get complicated cause there's nothing to prevent your load

Speaker: 00:27:34

tests from hitting multiple domains.

Speaker: 00:27:36

But that's interesting.

Speaker: 00:27:38

We should look into that.

Speaker: 00:27:40

You want to file

Speaker: 00:27:41

Do you want to

Speaker: 00:27:41

take a

Speaker: 00:27:41

request live on stage.

Speaker: 00:27:43

Nice.

Speaker: 00:27:45

I liked the idea.

Speaker: 00:27:46

Uh, it'd be a good question for Narayan to whether or not he would like that

Speaker: 00:27:49

feature, but it seems great to me.

Speaker: 00:27:50

Whereas before we were using 50% of our our server to generate traffic,

Speaker: 00:27:54

we'll see how much, how much we end up using for this much traffic.

Speaker: 00:27:59

But our, our Fastly stats should start, start ramping up pretty quickly here

Speaker: 00:28:11

and there we go.

Speaker: 00:28:12

There are, we can see there's no errors because ramping up cleanly like that.

Speaker: 00:28:18

It's catching up.

Speaker: 00:28:19

we're already surpassing the original tests.

Speaker: 00:28:21

Now we're at eight K requests per second.

Speaker: 00:28:23

It's still going up.

Speaker: 00:28:24

12 K 16.

Speaker: 00:28:25

K sure enough.

Speaker: 00:28:26

We were able to double it again.

Speaker: 00:28:27

so we're doing 20 to 23,000 requests a second and sustaining 9.2 gigabytes.

Speaker: 00:28:34

nine gigabytes of traffic a second with a nice steady line, I should add, you

Speaker: 00:28:38

know, it's not, it's not spiky traffic.

Speaker: 00:28:40

It's very, very consistent traffic which is fantastic when you're trying to do a

Speaker: 00:28:43

load test and the, and the server, you know, it's, it's using a CPU's there's

Speaker: 00:28:47

only 4% idle but it's, it's generating some impressive some impressive load.

Speaker: 00:28:54

This was, this was really cool to see.

Speaker: 00:28:55

Thank you guys so much for walking us through this.

Speaker: 00:28:58

we're going to do some follow-ups to this.

Speaker: 00:29:00

we're going to do some how tos on testing with authenticated traffic.

Speaker: 00:29:04

We're going to show you how to spin up Gaggles and do.

Speaker: 00:29:07

Distributed load testing with Goose and Fabian.

Speaker: 00:29:10

I'm going to take you up on your scalability webinar, show

Speaker: 00:29:14

people how all that works.

Speaker: 00:29:15

so please stay tuned for more.

Speaker: 00:29:17

we'll put all of the links that Jeremy and Fabian mentioned in the show notes.

Speaker: 00:29:21

Do you have any questions for us?

Speaker: 00:29:22

Please reach out to us at tag1teamtalks@tag1.com.

Speaker: 00:29:27

We'd love your feedback on this, questions about Goose, or even better

Speaker: 00:29:30

hit us up in the issue queues on GitHub.

Speaker: 00:29:33

and we love feature suggestions.

Speaker: 00:29:35

You know, what, what should we cover in the future on this?

Speaker: 00:29:37

You can see our past Tag1 Team Talks at tag1.com/TTT for Tag1 Team Talks,

Speaker: 00:29:43

and you can go to Tag1.com/goose to get a lot of links and information

Speaker: 00:29:49

about Goose Jeremy, Fabian.

Speaker: 00:29:51

Again, a huge thank you for walking us through this today and to our listeners.

Speaker: 00:29:55

Thanks for tuning in for anotherTag1 Team talk..

Share Episode

Shownotes

Transcripts

Follow

Links

Chapters

Video

More from YouTube