Artwork for podcast Tag1 Team Talks | The Tag1 Consulting Podcast
Moving from Drupal 7 to Drupal 10: Managing Complex File and Media Migrations - Tag1 Team Talks
Episode 11331st January 2024 • Tag1 Team Talks | The Tag1 Consulting Podcast • Tag1 Consulting, Inc.
00:00:00 01:00:47

Share Episode

Shownotes

Join us for a dynamic and timely discussion in the latest episode of Tag1 Team Talks, where our Drupal migration experts, including Janez Urevc, Strategic Growth and Innovation Manager at Tag1, alongside Drupal mavens Lucas Hedding and Mauricio Dinarte, dive into the nuances of media and file migration from Drupal 7 to Drupal 10. This conversation is crucial as Drupal 7 approaches its end of life and Drupal 10 emerges.

This episode offers a rich exploration of the evolving media landscape in Drupal, addressing key challenges in migrating both local and remote media, as well as inline embedded content. Our guests also share practical tips on efficient file transfer using Rsync, managing extensive file libraries, and overcoming specific hurdles in remote media migrations and cloud storage options like S3.

This episode is an invaluable resource for anyone embarking on a Drupal migration journey, packed with expert anecdotes, real-world examples, and problem-solving strategies from their own migration experiences. Get ready to enhance your knowledge and skills in Drupal migrations, especially in handling complex media and file transfers.

Don't miss out – listen in and arm yourself with the expertise to master your Drupal migration challenges!

Transcripts

[:

[00:00:35] That's why we are bringing you this series of talks diving deep into the world of Drupal migrations. And who's better to guide us than Tag1's very own Drupal migration experts? From the masterminds and maintainers of Drupal's migration tooling, to the individuals behind the most groundbreaking Drupal migrations, we've got an all star lineup.

[:

[00:01:23] Let's dive in. I'm Janez Urevc, Strategic Growth and Innovation Manager here at Tag1 and a long time contributor to Drupal. And I'm joined today by the two well known top contributors to Drupal, Lucas Hedding, one of the five current Drupal Migrate Core subsystem maintainers, and Mauricio Dinarte, Drupal Migrations expert and author of the 31 Days of Migration series.

[:

[00:01:54] Lucas Hedding: You're welcome.

[:

[00:02:17] Tell us about that.

[:

[00:02:45] Except it wasn't, it was a lot of things. And that model is a bit different in Drupal 10. Uh, it got pulled into Drupal core. It, um, matured and has a different setup. The main part is that, uh, we can still migrate between the two. You just have to just do it right. And there's a lot of contrib modules that.

Um, we'll help with us and we'll, we'll talk about those through the course of our time today.

[:

[00:03:32] Lucas Hedding: Yeah. File managed table in the database with files on the local hard drive.

[:

[00:03:40] When you migrate, you generally need to migrate twice, quote unquote, uh, first into managed files and then into media entities to be able to use media in a media library and whatnot. But sometimes, um, people decide not to even use media entities. [00:04:00] Um, and this is also something to consider before we start a migration. And I'm pretty sure Mauricio that, uh, you have experience with that. So what are your thoughts on this decision?

[:

It is not a choice that you need to make for the whole site. Like, you can, uh, port some of the fields to media entities and other fields into regular file fields as before. So, things that you might want to consider is if you depend for any reason on per field configuration, uh, then you might. Uh, you might go with regular files because at the moment, um, if you have one single, um, media type for images, for example, everything is going to go into that bucket and, uh, and how, wherever that field itself is configured, the same rules would apply, uh, in terms of where files are located.

[:

[00:05:53] Another example was, uh, uh, a break that had, uh, publication workflow, and they specifically didn't want to use media library because they could expose, um, You know, files and documents that should not be seen by people, uh, uploading files, uh, as part of the process. Again, like, there are considerations around permissions, around, uh, files locations, if that is important for you.

[:

[00:06:44] And, uh, you will be Tweaking the process, um, pipeline, um, because, you know, being different entities, there will be different properties that you need to map. And one last thing about this is that if you enable validation, um, which is generally a good idea in a migration, sometimes things are not obvious in the context of a single migration.

[:

[00:07:40] So you are going to get an error in the media migration for something that you kind of overlooked in the file migration. So just to be mindful that sometimes, especially when working with validations, you need to be mindful about configurations. in dependent, uh, migrations.

[:

[00:08:00] Um, I, I never thought about it, but it makes total sense. Um, when you were talking about per field configuration, I, I had this thought. Like, I guess you could, if you have different fields in Drupal 7 and you still want to keep the distinction, you could create separate, like, let's assume that they are all the same file type.

[:

[00:08:55] And I think that there, then you could also configure to, to store them in, a separate location. Or maybe, maybe that would be one, one way to solving that. Um, and this was, um, this was exactly the reason why we. We came up with this idea of media source plugins and media types, uh, to cover cases like this, when you have, um, the same type of asset, but you want to have a different type, uh, for different, items of, that asset.

[:

[00:09:41] Mauricio Dinarte: And one more comment on this is that we might be familiar with the, uh, media types that are provided by the standard installation profile. You know, there are five out of the box, but two things, one, you are not, you know, bound to only those five.

[:

[00:10:18] So if you're using something else as your, as your base, as your starting point for a migration and to Drupal 10, you might either have to recreate them manually or could be the same configuration that comes with the standard installation profile. Just want to highlight that because one of the very first time that I noticed that.

[:

[00:10:48] Janez Urevc: That's a really good point. And just to mention here, the source plugins that we have available in core. Uh, we have file like a generic file image, audio [00:11:00] file. And a video file and here with video file, we're not talking about things that are remote embeddable like YouTube or Vimeo or something like that, but like a literal video file dot web.

[:

[00:11:44] But as core evolved, I think that it's safe to say that nowadays, um, we can, we can cover a lot just by using functionality provided in

[:

[00:12:01] Lucas Hedding: Thanks.

[:

[00:12:19] How, how does that affect migration?

[:

[00:12:50] The, uh, The sources of these, of the data, the source of the data is, is a SQL query that you have to look in different tables, look at different places in the database, and there's a source source plugins that pull together both of those things. So one for, for the media and one for the files. And, and then allows you to map them to the new destinations in Drupal 10.

[:

[00:13:52] Of, uh, media from Drupal 7 into Drupal 10.

[:

[00:14:29] And if you were using media entities before, it would just like copy the configuration automatically and move over the data. It will also take care of things within the wizard, as you said before, uh, if you had embeds, if you have even, uh, uh, links to other entities, it will try to detect them and just like make the connections automatically under the hood.

[:

[00:15:15] And if, you know, for different reasons, like you have a new content model, a new set of structure, you need to make some changes to how the entities are like, uh, configured, um, like you might. You might have to go a custom route, and in that case, what we normally do is install the module, do not use it for the automatic migration of configuration, but instead just use the process plugins that are provided, so that in my custom migration, I am able to, you know, do these transformations like the embeds in the WYSIWYG, but adapted to my new content model that is, you know, being migrated into Uh, with the, with the custom migrations, so that's probably like the, the one that you might look first.

[:

[00:16:51] Janez Urevc: Great. Um, another thing that I have experience with is, uh, handling Large libraries and please correct me if I'm wrong, but I think that generally Migrate when you're migrating files. It will check if a file exists on the expected destination. And if it doesn't it will try to copy it from the source Where it should be, um, and if you have a large library, um, that could take quite a long time.

[:

[00:17:54] Lucas Hedding: Um, It's not default.

[:

[00:18:21] Mauricio Dinarte: Yeah, so it is important you mentioned like the strategy that you mentioned is actually very helpful because it's going to speed up the process of the migration.

[:

[00:18:59] If not, the default is that if the file exists, it is going to, uh, rename it, I think. So, uh, just like make a copy with, uh, uh, an appendix, uh, like underscore zero, underscore one, underscore two. So you need to use that, uh, that thing. Another thing that I want to point out is that, um, it is common in a migration, uh, to be importing and rolling back the migrations because you are testing things.

[:

[00:19:54] And in particular with the Entity API, rolling back an entity means. Uh, deleting the entity and, uh, file entity, not only deletes, the record from the database, it also deletes the file from the file system. So if you already copied, you know, 10 gigabytes of data and you rolled up migration, you rolled back the migration, um, you're going to lose all the files.

[:

[00:20:40] So

[:

[00:20:47] Janez Urevc: Thank you for sharing your secret

[:

[00:21:06] But if you are just working locally and you want to speed up the process, another trick that I do is that. In the file migration, I do not use the file copy plugin. I just literally map the URI from the source to the destination. And then I use a staged file proxy to only fetch files as needed. Again, like you need to be mindful about these things because in the final migration, you need to do it properly.

[:

[00:21:46] Janez Urevc: And that's why we have this team talks to share little dirty tricks that we use.

[:

[00:22:03] Lucas Hedding: Uh, No, I mean, they're private for a reason, so you're not going to be able to get access to them, but you still want to move them over. Right. Right. Um, yeah. So, just like Mauricio said with the, not actually doing the file copy.

[:

[00:22:41] Rsync the files over, otherwise it's death by a thousand paper cuts. And there's really no good way to know that they actually got over there. I mean, HTTP is a terrible way to sync things. If you try to do a request using HTTP, which is what happens when we're doing the file copy, um, one out of a thousand requests will fail.

[:

[00:23:27] I don't, I don't know of any large file migration that someone hasn't used Rsync. Um, there's even like, if you're using Pantheon, Pantheon has an Rsync terminus plugin to make this really easy. Um, yeah, make yourself. Make your life a lot easier and, and do it the right way.

[:

[00:24:14] Files, like things like YouTube videos, um, Flickr images, Instagram photos, um, TED talks, uh, Vimeo videos, and general OEmbeds. You could have like SlideShare slides or something like that. Um, again, in Drupal 7, media store these as managed files. Um, in Drupal 10, we don't do that anymore. Then store them as media entities using Uh, the correct source plugins.

[:

[00:25:01] Lucas Hedding: In some ways it's identical. It really is. However, because it's a new type of thing, a lot of times you get more requests on. Can we filter out certain ones of them? Can we divide them? Can we chunk them up? Or, hey, we've got various reasons, these iframes directly embedded in CKEditor, and our, and so we need to extract them, and use this new media thing, new media entity, and, and create a media entity for it, and then replace that with, A media embed right in the CKEditor.

[:

[00:26:22] And when you're dealing with iframes, you can then do, uh, queries, direct, uh, DOM queries using, uh, like real X query style requests. You're not even doing regex then you're doing, uh, manipulation. So you find the iframes that have the YouTube videos, and then you can create the media entity inserted in the destination site.

[:

[00:27:15] And it's actually not all that complicated to do and makes the customer so much more happy when everything's converted over to media.

[:

[00:27:42] Again, like the metadata API is just interacting with other. Um, APIs of Drupal itself, and in this case, the Entity API. So, um, if you have a large migration, whether it uses remote media or not, you can disable the generation of [00:28:00] thumbnails and, like, defer that to a later stage that can happen on Cron. Point being, It that that generation doesn't happen during the migration and is specifically important for remote medias because you will be being in an external service to get the thumbnail itself.

[:

[00:28:39] Lucas Hedding: That brings up, uh, uh, remembrance with the local files too. So local files, if you say, ah, we don't need to pull over the file sizes. Or, um, well, see, this was the issue. This is why I remember it. We were pulling over the files, the file sizes, just because that's what we dutifully did. Except, uh, Drupal, Drupal's entity API for files was ignoring the fact that that, that the file sizes were already provided to it during the migration.

[:

[00:29:40] But you still have to make sure that you pass the file sizes over. Otherwise, it's still going to, uh, try to figure out what the size of the file is and store that in the database.

[:

[00:29:55] Lucas Hedding: Yes. All of the details.

[:

[00:30:20] Mauricio Dinarte: Um, YouTube and video are supported out of the box. Um, in general. Drupal supports OEmbed, uh, but if you want something other than YouTube and video, that provider gives you the option to fetch from OEmbed, uh, you can enable more. And there are other models, uh, one is called VideoEmbedfield that will give you like a very long list of more providers that you can pick from.

[:

[00:31:20] Um, what about the inline? Or embedded media in CKEditor. We, we touched on that in the past, um, but how, how that affects migration. Um, as far as I know, like the, the embed tag is definitely different. Um, so we need to handle that. Uh, what, what else might we consider? While doing that?

[:

[00:31:57] Uh, and there is a configuration that you] can, uh, you can toggle, but again, like media migration is trying to automatically convert all of those that's needed. Um, if you're not. Doing like a 1 to 1 migration, uh, you can still use the module without using the resource plugins for generating the configuration.

[:

[00:32:50] Um, for this project, we were, it was a Drupal project already. It was a Drupal 7 to Drupal 10 upgrade at that point. Um, but. They also wanted to consolidate other properties that they have, and some of those were in WordPress. So. We were migrating WordPress post and their associated images and files. Uh, what we did was install, uh, plugging the WordPress sites to be, to be able to export the configuration.

[:

[00:33:43] And from there, we generated CSV files that we migrated into in Drupal. And in this case, we were actually using, uh, the media Uh, media migrate handler module, uh, because it, it provided very good reference process plugins. So again, I, I really recommend also looking at that one, uh, especially when you are migrating from other things outside of Drupal.

[:

[00:34:17] Lucas, maybe we can briefly talk about the architecture of embedding in D10, because it's different than D7, because in D7, MediaModule did everything. What do we have in Drupal 10? I know this is not strictly migration related, but it's still useful knowledge for the context.

[:

[00:35:40] Uh, the code, the short code, or like the, the token that is embedded is a slightly different. Uh, for, for entities versus media, uh, there's a lot of similarities. Um, you just got to get the right, right short code to, to get inserted into your, your CKEditor for, to then go out and find the right thing to load and embed with the right view mode and all the other options that would get passed along to the thing that you're embedding. Okay.

[:

[00:36:40] Lucas Hedding: It does,

[:

[00:37:00] But entities, entities, you can find that, you know, it was called node 1, 2, 3 on the old site, and now it's called maybe node 2, 3, 4 on the new site. And let's just connect the dots here and replace it with the right embed code.

[:

[00:37:36] Lucas Hedding: Well, you have to, it depends on the provider. If you're dealing with S3 itself from Amazon, then you want to Minimize data transfers. If you're on a budget, if you're not on a strict budget, then maybe you don't care. Uh, but you can clone buckets very easily. So you can have bucket one and bucket two.

[:

[00:38:41] Oops, don't want that to happen. So clone it and you do yourself a favor.

[:

[00:38:47] Lucas Hedding: That has not thankfully happened to me. Not exactly that. Although close enough that I've had scares. So clone it. Um, so that's if you're dealing with S3. Now, if you're dealing with another provider, Backblaze or, uh, even Cloudflare now has a new object storage.

[:

[00:39:37] If that's where you, you've. Landed or S3 itself, right? And then use S3 command to sync them up to the cloud. It's going to be so much faster S3 command. Uh, doesn't there's no API to do synchronous HTTP, uh, directly to S3 because S3 itself doesn't quite support it, but, but it's kind of multi threaded. So it'll do up to like eight file uploads all at the same time with S3 command. So you're, you're, you're going to be a lot, a lot faster if you're dealing with 100 gigabytes of data or 200 gigabytes of data, uh, and your pocketbook will be a whole lot better. Uh, the next month, once you've rolled over to S3, cause, uh, your S3 compatible storage is going to be way more, um, cheap than local storage.

[:

[00:40:50] Janez Urevc: Yeah.

[:

[00:41:12] And then they switched to S3 for that reason.

[:

[00:41:38] It just was a bad, bad, overly used name to move the files from from local to S3. That's now part of S3FS itself. And it's still even in S3FS, not really. Migrate in the sense that Drupal migrate, but it still lets you move things from local to S3. That's one thing. The other thing, uh, around S3 is there's an S3FS cores module.

[:

[00:42:40] So now we've got timeouts for PHP. Uh, of 60 seconds or 30 seconds and max post size and all these wonderful things. Maybe your host on Pantheon and it blocks a file that's over a hundred, you know, all of these things, S3FS cores will let you post [00:43:00] directly to S3 bucket. And so then all of those, I think S3, um, providers have limits on how long it'll take to upload the file, but they're like.

[:

[00:43:26] Janez Urevc: Great. Thank you. Um, and let's, let's quickly discuss, uh, migrating media from remote sources. And here we're, we're not talking about like things like YouTube videos and, you know, embeddable things, but actually file assets that we get from remote sources. Usually to HTTP requests, um, and Migrate API will generally copy those over.

[:

[00:44:13] What's your experience there?

[:

[00:44:47] an external website, I've dealt with some really odd things where two, two days later, we finally figured out it was an SSL cert issue. And it was only when we ran the migration on Acquia. Which had one version of Open SSL and the remote site that we're pulling from had another version of Open SSL that the certs were not working the way that we are expected and the files were randomly not getting pulled over.

[:

[00:45:59] [00:46:00] And you'll have the joy of figuring it out. So just to R sync the stuff, I mean, that solves nine out of 10 problems, 99 out of 100 problems. To Rsync the files.

[:

[00:46:48] Um, just like an iframe in, in your bridge text field is generally a bad idea. Uh, something else that. Again, I guess at this point, I'm going to take the, uh, just like multiple interpretations of remote files because Drupal itself can have, you know, remote files, uh, in, and specifically I'm talking about files that are not in the file to manage table, I have some projects that they use.

[:

[00:47:41] So that's another one that's kind of freaky, like having to like search in like basically the, the file system for things that are not managed by Drupal itself. And again, like speaking about communication with, between one environment and another, something that I have seen.

[:

[00:48:23] If you don't do that, uh, you can, you know, go around it by like providing, you know, uh, HTTP instead of HTTPS. Uh, and if it's a local environment, it might not be a big deal, but again, like, um, those things that happen during development and depending on what type of project you're working on, you are going to be dealing with this kind of thing.

[:

[00:49:00] Janez Urevc: Great. Thank you. And for the end, I, uh, thought that we could share any interesting experiences from the past that we've seen during migrations. Uh, Lucas, do you have any? I mean, I'm sure you do.

[:

[00:49:27] Um, but if you do do a file migration, there is Still after like probably eight years, uh, an open issue in Drupal. org for a memory leak. And with, during a migration, it's just generally about memory leak and migrate, uh, during a migration, but, um, about every six months, someone will find that issue and post on and say, Hey, I'm having this too.

[:

[00:50:22] And then the migration dies and it's usually during the file migration. So what are some workarounds there? Um, there's a batch mode, uh, that you can pass to your source plugins. it's all of these migrations, um, for the most part have, well, they all have a source plugin. There it's built into the, there's a couple other sessions.

[:

[00:51:08] And so for the file migration, do that. That'll help you. Um, the next thing is, uh, rather than if you're using a Drush migration, Drush migrate import, and then here's the name of the thing I want to import, do it on a per migration name, use like tags or a group or anything and say, migrate all the things all at once.

[:

[00:52:05] Um, and then one more last thing, uh, you spent all this time and you want to get a pristine report. There was 100 files on the old site and only 99 got moved over. According to the report from Drush Migrate or whatever, you're always going to have missing files, because these were files that were stored on the hard drive.

[:

[00:52:56] Uh, you had an incident or some, for some reason, there's always missing files. Always.

[:

[00:53:41] The problem in this case was that, um, the high watermark as of today, and there is an issue for this, um, is set up very early in the process. Uh, when, when you're processing the row. And because of the out of memory error, um, you know, the, the, the high watermark has already been saved in the database as if the row were, was, uh, processed correctly, but it failed because of the memory error.

[:

[00:54:33] And as it is a good idea. To have, um, migration dependencies in place, then any other migration that depends on files is going to be blocked because you are one file short of in your file migration. So you need to like, in this case, like it was some thing debugging to figure out what was going on, what was the cost and then reset the high watermark value to be able to migrate that missing file.

[:

[00:55:31] Janez Urevc: Interesting.

[:

[00:55:58] Um, and it's, um, it was not a migration. It's from the, the times when I worked at examiner. com, which was the largest Drupal website on the internet at the time. Um, it was a D7 site. Uh, but this was after a few years after the migration to Drupal 7 happened. Um, but we were switching data centers, uh, because we, we, we had our own hardware, but we switched the data center provider.

[:

[00:56:53] So even Rsync doesn't solve every problem. I found it hilarious like that. It's it's, it was actually easier to do it over snail mail with hard drives. Then

[:

[00:57:10] Janez Urevc: Yeah. Just, just to catch up for sure. But, uh, yeah. Rsync is there.

[:

[00:57:26] Our goal is to put one out per week over the next few months to support the community in the migration process. Um, performance, we touched on performance today and it's something that we care deeply about at Tag1.

[:

[00:58:15] And other talks that we are planning to do include topics like porting custom code from Drupal 7 to 10, the future of the Migrate tooling, how to port the team and so much more. So we hope that you'll tune in and enjoy our upcoming team talks.

[:

[00:58:50] So Mauricio, can you tell us a bit more about the new upcoming series?

[:

[00:59:18] It is actually like a real project that we will be migrating together. Um, Both content and configuration will be migrated but more important than the technical part is also like giving advice, like before writing the first migration, before executing the first command, we're going to discuss things like understanding, you know, the tool that you're going to use, the Migrate API, because it is probably the most popular one, but by no means the only one.

[:

[01:00:03] Janez Urevc: We're all looking forward to it.

[:

[01:00:31] Write us at ttt@tag1.com Big thanks to our two guests today and to everyone who tuned in. Thank you for joining us Thank you.

Links

Chapters

Video

More from YouTube