Improving Drupal CMS Performance with Gander: Real-World Wins and Lessons

:: 00:00

Michael Meyers: Hello and welcome to Tag1 Team Talks, a podcast of Tag1 Consulting. Today we're gonna be exploring some of Tag1's work on the Drupal CMS and how Gander's being integrated as a foundational element in the next generation of content management systems developed by Tag1 in collaboration with the Google Chrome team, Gander's Drupal's official performance testing framework, ensuring that performance is a key consideration in Drupal development.

Joining me today is Nathaniel Catchpole, Tag1 team member and Gander Project Lead. For over 20 years, Catch has been one of the most prolific and influential contributors to the Drupal platform. You don't know Drupal Core. If you don't know Catch. Welcome to the show Catch. Thanks for joining me.

Nathaniel Catchpole: Hey, how are you doing..

active contributors to open [: 00:01:00

If you need help with your sites, platforms, or applications, please email us at info@tag1.com. That's TAG the number one.com. Thanks Catch. I really appreciate you coming back for another update on Gander. Before we jump into, all the work that you've been doing with the Drupal CMS team on Gander there, there might be some listeners who aren't really understanding of, what Gander is and how it works.

So why don't we just step back and can you tell us, what is Gander?

what we do, we work with the [: 00:02:00

The way that we implemented it is that we use Drupal's functional JavaScript tests, which run in a real browser. And then we collect data from the browser performance logs. And also from the Drupal backend, like the site under test and aggregate those and then allow those, allow assertions to be made on those within the test.

So you can check how many CSS, how many style sheets are downloaded, say for JavaScript. You can check how many images on the page. But you can also check cache gets and database queries on the backend as well, and all from the same from the single request. So it's not like you've got separate front end and backend tests.

k. You can set up a scenario [: 00:03:00

Doing it in PHPunit gives you like complete control over the site, under test, so you know that the scenarios are the same every single time it runs unless you make a mistake, but at least there's a good chance to getting exactly the same. And then it lets us fail tests when things go wrong.

So if we say this page has got five database queries and then the site goes into an MR and it makes six database queries, the test fails. So it's not like you've just got lines somewhere that you can look at once a month. You get immediate feedback, like even running tests on the command line and especially on Les like, Gitlab ci.

n't really test on, like the [: 00:04:00

But those we send to an open telemetry dashboard built with Grafana and those you do see on the graph and that gives you comparisons between different kinds of scenarios and over time sort.

Michael Meyers: So you can write tests for pretty much all of the things integrated into your CI pass fail warn based on certain conditions and monitor the impact of performance on your application over time. And all this helps you catch problems early so that it's significantly easier to address them. And frankly, a lot of times performance problems aren't caught at all until they become a, a glaring nightmare.

So this really helps you get ahead of the problem. How has Gander been used to date in Drupal core development?

e the standard and the Umami [: 00:05:00

So it's the closest you can get within Drupal core of having a realistic site to test. What this means is by hitting pages in the Umami, we're seeing the impact of a lot of different code that runs in Drupal core. So it's if something happens to views, we see that, something like that. So they're not very like low level targeted tests like unit tests.

They're very high level tests that hit a real page that does quite a lot. But then you pick up changes on that page and then you can track it down. And mostly like we've got as I said, we've got like front end and backend assertions, so we're checking both what the front end performance of those pages is how many JavaScript aggregates how big the scripts and style sheets are, like how many bytes and also how many database queries and cache gets and things like that.

And then, [: 00:06:00

Michael Meyers: So, tell me about how you surface this data. I know there's a dashboard, are there logs and local ways of viewing this? How do people see the data associated with this?

Nathaniel Catchpole: So if you introduce an issue that's gonna cause a failing problem with one of the existing tests you you get like an immediate report on GitLab CI and sets on to about five to seven minutes now.

ich has the database queries [: 00:07:00

So it's one of two things, but Mo I think most people who interact with it interact it when they cause a test to fail. 'cause that kind of, that's in your face. You don't have to go looking for that. It's, yeah.

Michael Meyers: So, you've talked a little bit about what it does and how it works. Is it working?

Could you share some, real world success stories of how Gander has, improved things?

Nathaniel Catchpole: So, yeah, so, so the most recent improvements have mainly come via the testing value for Drupal CMS. So we we, so the obviously Drupal core test can only test what's in Drupal core. And we, there's a limited number of scenarios that~ ju~ that are tested in Drupal core.

till now just test the basic [: 00:08:00

So it installs that and then you visit the front page and it logs in and it checks logged in and logged out and checks what that, and just from that test, it's like a couple of pages. It's a very short test. Must have found half a dozen issues. Quite big issues as well. So the first one that jumps out.

avaScript. Like you often see: 1550

Does it need to be? And also found an issue. Someone had opened a, someone beat me to it by two dates, but I did find the same issue that they found with the performances that, that the live size was really big, like unexpectedly big. And then. Like the great thing was like within a couple of weeks that was fixed.

Both issues were fixed. So they found out first of all that like, not even the Drupal module, but the Upstream Klaro project had set the wrong flag when building one of the library variance ships and it and that and just fixing that little flag upstream took it down from 300 kilobytes to 60. This is uncompressed.

ng after the privacy recipe. [: 00:10:00

Klaro Cookie Consent Management. It's only required if you have something enabled on your site that requires cookie consent. So Drupal CMS ships with YouTube embeds. So if you embed a YouTube video, it wants to check if you consent to having YouTube cookies before it shows you the video. But if you haven't actually uploaded, if you haven't actually embedded any YouTube videos on your site, you could you could upload one, but you haven't got any, so it doesn't need to check YouTube.

But what he was doing was giving you the option to check YouTube consent when there were no videos on the site to consent to. So he rewrote the recipe. So it only adds each one when content is actually added. That requires the consent, and that took, it had already gone from 300 kilobytes to 60. Then it went down to zero.

ans like just no JavaScript, [: 00:11:00

Sometimes you find performance issues and then like they don't get fixed

or it takes years, like years to figure out how to fix it. And it was nice that it was really quick and it's and it's not just fixing Drupal CMS, like if you use a contrib module, everyone that uses has been using that. When they update to the latest version, they will get a smaller JavaScript library.

And you could implement the same logic as the recipe where it's like progressively enabled. So that, so it just doesn't switch on for for most sites that just, that aren't embed in YouTube videos or whatever they need it for but what they want it there when need to. So that, that was good.

ike that because that's what [: 00:12:00

But because it's what will be in those Drupal CMS wanted to early preview it and make make it a default experience, give people something new but still in development. And we knew that there were, that there was some caching issues to improve with it but hadn't quite figured out.

What to do about them yet. But when writing the performance test, it's went from one page to the next page and you could see the navigation database queries like building the menus with both of those requests. And what I realized is that even though it's cacheable Google's ~DI~ dynamic page Cache would cache It, we were rebuilding the navigation like the menu tree on each different page.

he site, based on your roles.[: 00:13:00

And that means it only builds once for you for the entire site. You can go to a thousand pages and it's only actually built those only two once and it's one cache entry. But then that meant that still meant that the whole thing would end up in the dynamic page cache. So the HTML. Gets cached in your database like a thousand times, even though it's all coming from one like nested cache entry.

So we want, so what we also did was placeholder it. So placeholder means that instead of being embedded, that like all of the HTML embedded in the dynamic page cache, it's just a little placeholder and that gets replaced. And that meant that it would be loaded by Big Pipe, which causes content layout shift.

't duplicated across all the [: 00:14:00

It's like we and that led to some Drupal core improvements. We didn't have this cache placeholder strategy. So you always had a choice of not place holding something or Jan like it would like, you know when your page loads and the sidebar comes in and stuff moves across. So we were able to find buy improvement navigation module, a way to improve Drupal core placeholding strategy.

So, which means that you can have Big Pipe enabled and when you've got cold caches, things are slow, but they're loaded progressively. So the actual page load is fast, but once the caches are warm, you don't need that. You can just load like a string of HTML from the database. It's well from the cache and straight out to the page.

Drupal CMS, we went through [: 00:15:00

And then the current toolbar in Drupal core that it replaces, has like custom JavaScript. It uses like the browser, the~ local ca ~local storage to avoid all of the problems that, that we found because it was built in Drupal seven, the capabilities that we have now were not available then. So it had to do a load of custom code and like it's very complicated and very custom that the, what, what's in Drupal core now?

The old toolbar and navigation has no custom code for this. It's just using Drupal core. APIs and like a few render away conventions and it gets all of this like for free. So that's been like really nice to see. Like just, and it is yeah. And but we, and we would've got that from Drupal core eventually, but we got it quicker because we tested it against Drupal CMS.

. Yeah. This is so exciting. [: 00:16:00

You, I know you have more and we gotta move on, but we have to do an episode just on these success stories because it's so cool to see how, even in its early days and stages, it's starting to fulfill that dream. And, like originally I, I think about some of the use cases you talked about, oh, guardrails around like kilobytes of JavaScript size and, there are different use cases, but I have to admit, like the first time I heard you talking about, I was like, why would you need that? Right? And there's like the, a really amazing example of how a simple test like that can catch a major issue and problem and lead to cascading benefits. And so it's there's Drupal which happens at a large scale, enterprises developed at a large scale.

efit in different ways. And, [: 00:17:00

It's been a couple of months now. Can you just, a little bit more detail on, okay. You had a bunch of stuff originally with Gander and it's integration in Drupal core and what you're doing with core development. You touched on some of the Drupal CMS stuff. What has been the focus over the last, two, three months?

Nathaniel Catchpole: So there's been we've found a few things to improve with the performance, the testing framework itself. The one thing that was added three or four months ago was like, database query assertion. So, we used to assert the database query count, so see how many database queries ran.

ut he added like a query log [: 00:18:00

And if a different one gets added, you in your test failure, you see a diff. Oh, this one got added. And if you remove one, oh, this one got removed. And then most of the time you just either add the line or delete the line. 'cause it's kinda what you're expecting. If it's unexpected, you don't have to go and get out the debugger.

You can in the test, you can see what was added or what removed. Like it's getting close to the capabilities of like Drupal's develop, query log and the web profile module, which give you a query log for the page. But the difference is that when you are looking at like a normal site, you might have to click around quite a lot to actually find out where that query is.

And yet, and then you still have to look at a list of a hundred queries. And then like mentally diff, which one got added or removed? It's oh, that is 121. And then which one of those one, which one's a different one? And here it's oh, that's the one that was added. That's the one that was removed.

And then similar-ish [: 00:19:00

And that render cache is tagged with the id of the node. So when the node is updated and the title changes or the image is changed or something, the cache item gets invalidated and then you get a fresh copy. So you don't have to worry about like setting short TTLs, you can set permanent, you can cache it forever.

e got the cache tags they've [: 00:20:00

But if you've got, if you are like in a database constrained situation and you're looking at like queries per second, the percentage of the queries per second that are cache tag queries, or even if it's like Redis, there's a certain point where you hit like per second request, per second issues with Redis backend, like the percentage is quite high.

So he was trying to look at how to get rid of them. So he's added. Assertions, it's called like group cache tags. So it's essentially when you look up cash items, say it's got five cache tags on it, we look up those five. If we look up the next one and it's got the same five cache

tags, we don't look 'em up.

'cause we've already, it's totally cached. But if there's one different, it'll look that one up. So it is added that, and you can actually see on on the page not only the database queries that are running, but also which cache tags are being looked up in which order. And that means that they, like we, we are finding, and we've, from that we've found ways to optimize the cache tag lookups.

[: 00:21:00

So there's a few things that, pretty much any request in eight times, it's already, like it's already in 11.2 that one went in. So, any request to Drupal, like HTML requests at least is going to load like about 10 or 12 cache tags. And they're all separately for like quite low level cache items that load individually.

t cache request preloads like: 1215

Would be to like profile it with XHProf and then take a screenshot and then apply the change and then take another screenshot and then cross your heart and hope to die. You hadn't made anything up or hit the wrong page. And it is very easy. Like you'd see ones and it's obvious, someone's like completely messed up the testing.

'cause there's hit it with a cold crashing hit with a warm crashing. It was like, well it's doing all this other stuff. There's no way you fixed all of that. 'cause that's that's different. And then, but with the form set, it's just here's this, like this happened Exactly what change has occurred.

And you can also see like when it, when you haven't fixed it. 'cause it's not affecting the test. And it's like sometimes we find things that we can't performance test, but we are, we like, as it's maturing, we are finding ways to test those things so that we can show it in the test coverage. So it's like the performance tests are driving improvements in some areas, like when we test something we haven't tested before.

ues to fix. But also when we [: 00:23:00

I think. There's plenty more to do and there's plenty more, there's always things to fix, but it's getting easier to work on performance improvements. And you can see like 11.2 has got quite a lot of improvements and I don't think we would've had those without it. Wow.

Michael Meyers: And a huge thank you to Sasha, Christian, everybody.

Like to hear that people aren't just using gander, but building out its features and capabilities. That's super cool. That's a really great thing to hear. And the impact that it's having is fantastic. So that's a good segue. What's on your roadmap?

o like super long-term stuff [: 00:24:00

Nathaniel Catchpole: So there's so in progress over trying to add Drupal CMS to the Gander dashboard. Actually nearly thought I got it working a week ago, but I don't think it's not quite there yet. Anytime I was so it could have been there for today, but may maybe next week it's, but the, all the groundwork is there.

It just needs to actually show up on the board. So that'll be there like alongside Drupal core. And we were able to. There was an issue with the dashboard. For some, I think it's the quirk of how Grafana works. You had to run at least three tests an hour to get it to show a line, which is sounds really minor, but was quite tricky and annoying.

ecause it would overload the [: 00:25:00

So that's all I, so that's kind of part of, it's done with just just the actual Drupal CMS isn't quite reporting back as it should be yet. There's, I'm starting to see the first contrib test actually works on the C tape work. He also wrote test for the redirect module. So redirect allows you to, like in your database store redirect from one path to another.

Usually use it with Path Auto so that when you change the path auto, like the path alias you can redirect from the old one to the new one. But that does a database query on every request. Like every request, every page checks if there's a redirect. But there's a lot of pages that are never gonna have a redirect.

one redirects from one admin [: 00:26:00

And that's gonna, and if you take if you think it's only one database query per request, but there are tens of thousands, maybe hundreds of thousands of sites with the redirect module on, and every single PHP request that doesn't go through the page cache is a query. So that's I don't know, I don't know how many, like, how many requests there are to Drupal sites a month, but it's probably trillions.

So it's if you can remove that one that's like a trillion database queries because it will just never happen again. Right.

Michael Meyers: Are you saying there are a lot of database.

I had no idea.

le: It's it's like it's only [: 00:27:00

Michael Meyers: This certainly wasn't like the primary goal of building Gander, but it's an interesting, outcome is that I don't wanna call this a micro optimization, but if you think about, the millions of websites that run Drupal, making a small optimization has a huge impact.

That's, that could be millions of websites that are doing, a few less queries. Every time a page is built requires less energy to build and serve a page. In aggregate it, it does, I shouldn't, laughed. It is funny. But in aggregate, there actually, I.

r site, it's like death by a [: 00:28:00

You just keep getting slower and slower and slower and slower, every freaking release. And the next thing you are a quarter cycle and it has a material impact on your site, on your business, on, on everything. And so, any way you look at it you can of course prioritize, where you expend your resources in addressing the problems that you find, right?

You need to rack and stack and triage. But, these little things should not be overlooked for many reasons.

Yeah. It's, it is also so you've got five things that take 10 millisecond each. Like it might be easier to fix than one thing that takes 15 milliseconds. Like it's not necessarily, it's not necessarily wasted work or just, it never hurts to do less.

And it makes things easier to find as well. If you haven't got noise you look at if you can eliminate noise, even if it's not actually taking the time, then it's easier to find where your actual real problems are. But if you've got like a very noisy thing, like loads and loads it's like the cash tag queries.

cash tag queries, then like [: 00:29:00

yeah, that's a good point. Correct me if I'm wrong, but gait tries to normalize things by creating consistent environments, like you said, cash scenarios within which to test things. But it is just a it's an example install. It's not a pure replica of your production environment.

to potentially, prioritize, [: 00:30:00

Nathaniel Catchpole: There's so we had a workshop back in DrupalCon, Barcelona to add a lot more assertions to some of the existing tests and as some new tests quite a lot of people have been working on those over the past two or three months.

We're starting to land in BRTC, been review on those. So some of the very early tests, we weren't like certain things like database queries and things like that. So it's just like adding in the detail that was there. There's there's a not net not directly related to performance setting, but in terms of like general Drupal performance improvements, we are building up to being able to use async in Drupal core.

ould add support for that to [: 00:31:00

So like long listing queries and Views, which tend, like on real sites. That is the thing. Like the killer, once you've got lots of data you can, it's very easy to make some plays, slow database queries and you should optimize the queries. But what you can also do is file 'em off and do other things and then come back and get 'em later.

So for perceived performance, it's useful to be able to do a bit of parallel processing. It's not real parallel but you know, async so that's it's early days, but we are making like incremental progress towards it. So I think that, that would be one over the next year. That's gonna be one of the bigger things coming through and there are like for cold cache situations.

atabase queries where Drupal [: 00:32:00

But that was with like warm caches. But when you have like completely cold start, there's so many things like plugins, the theme registry, the router that have to be built before you can do things. ~Lo~ Those take time. So if you've got like quite a heavy site and you do a full cache flow, it can be a second two seconds, five seconds before any page gets served.

And when you have large high traffic sites that want to do lots of deployments. That can cause it can cause high outages. And that's, so, and async will, should give us a lot of tools to massively improve those kinds of situations. 'cause it's instead of like progressively going through the page request, like every stampeding request getting blocked on the same things we should be able to like slice and dice so that they can share 'em up.

thing that's in the critical [: 00:33:00

So I think that's yeah, in terms of things that are in progress but aren't done yet. That's a big area. That I think we will come in over the next six to 12 months, and we should be able to performance test some those things once say land as well.

Michael Meyers: Awesome.

I'm really excited the progress you guys have made to date is really exciting. We'll have to do another catch up in the near future. This is, this is my morning and like you've already made my day. Like it's been so awesome to hear the progress that you've made really and everybody who's contributing to it and, and the solutions.

Real quick before we wrap up if there are developers out there that want to get engaged in this are you looking for help? How do they find folks working on gander and doing this kind of stuff?

Nathaniel Catchpole: Yeah. So there's, if you look at the Drupal core

u tagged a lot of the issues [: 00:34:00

We can link to that in the summary where it's getting, and there's a, and there's like full documentation for getting set up locally as well. If you want to like form tests for a module for distribution, then there's kind of instructions to get started with that.

Michael Meyers: Fantastic. Catch. Thank you so much for taking the time to, to walk us through all this. You really did make my day. I'm smiling. For everybody who's listening we'll do our best to put a bunch of links into the show notes so that you can check out what we talked about. If you like this talk, please remember to upvote subscribe and share it out.

but they liked it. So, we'd [: 00:35:00

That's three ts for Tag1 Team Talk. And we'd love your feedback episode ideas. Please write to us at ttt@tag1.com. Again, when folks write in with things it really is is great. So, appreciate your spending time with us today. Catch. Thank you. And we will talk soon. Take care.

Nathaniel Catchpole: Cheers. Thanks for having me on. .~ Thanks for having me on ~

Share Episode

Shownotes

Transcripts

Follow

Links

Chapters

Video

More from YouTube