Join us in this lively episode of Tag1 Team Talks, where our seasoned experts - Mike Ryan (co-creator of Migrate) and Benji Fisher (maintainer of Migrate) and host Janez Urevc, unpack the final "Load" segment of the ETL (Extract, Transform, Load) process crucial for Drupal migrations. As they dive into the nitty-gritty, they shed light on the intriguing core mechanics of loading data into Drupal, bringing to the fore the remarkable pluggability of Drupal's migration system. Mike Ryan reveals how diverse destination plugins can turn this system into a migration powerhouse.
At the heart of our discussion, the essence of performance optimization isn't lost as our experts explain the magic behind handling one entity at a time during migration. The conversations are peppered with insights that make the daunting migration task feel like a breeze.
We also feature upcoming engaging talks aimed at easing the community's transition from Drupal 7 to Drupal 10. Dive in! This will will be an enlightening ride filled with hearty laughs and profound takeaways!
Welcome to Tag1 Team Talks, brought to you by Tag1 Consulting.
Speaker:With Drupal 7 and Drupal 9 rapidly approaching end of life, we are
Speaker:hearing people talk about migrating and upgrading more than ever before.
Speaker:And anyone who's ever been involved with the large scale migration,
Speaker:migrating a large site or application from one technology stack to another,
Speaker:will tell you that it's complex, time consuming, and it demands expertise.
Speaker:That's why we are bringing you this series of talks, diving deep into
Speaker:the world of Drupal migrations, and who better to guide us than Tag1's
Speaker:very own Drupal migration experts.
Speaker:From the masterminds and maintainers of Drupal's migration tooling, to the
Speaker:individuals behind the most groundbreaking Drupal migrations, we've got an all
Speaker:star lineup who'll cover everything you need to know about every aspect
Speaker:of migrating large scale applications.
Speaker:This team talk is part of the three part series about ETL, Extract,
Speaker:Transform, and Load process, which is used by many enterprise migration
Speaker:systems, Drupal's Migrate included.
Speaker:In today's episode, we're going to talk about how to use Drupal's
Speaker:Migrate system to Load data into the destination system, which is usually
Speaker:Drupal, but not necessarily always.
Speaker:Be sure to stick around to the end because we're, uh, also going to announce
Speaker:the next few talks in our series.
Speaker:Let's dive in.
Speaker:I'm Janez Urevc, senior engineer here at Tag1 and a long time contributor to
Speaker:Drupal, and I'm joined today by well known top contributors to Drupal, Benji
Speaker:Fisher, one of the five current Drupal Migrate core subsystem maintainers,
Speaker:and Mike Ryan, co creator of Migrate.
Speaker:Welcome, and thank you for joining me.
Speaker:In case you didn't already watch or listen to the previous two episodes in
Speaker:this series about E, Extract, and T, Transform, I'd suggest that you do so.
Speaker:In the first episode, we, among other things, provided a high level
Speaker:overview of what ETL stands for.
Speaker:So, we're not going to cover this today, and we will dive directly
Speaker:into today's topic, which is...
Speaker:L, Load.
Speaker:So Benji, could you tell me and to our audience, of course, what
Speaker:is being done as part of the Load process and how specifically
Speaker:is that done in Drupal Migrate?
Speaker:Sure, so almost all of the time, um, the load phase is
Speaker:going to be creating entities.
Speaker:Um, Drupal is structured with, um, you know, a fairly consistent content model.
Speaker:So we can have configuration entities and we can have content entities.
Speaker:So content entities are things like taxonomy terms, nodes, users.
Speaker:Um, and, and we'll be creating each of those in a separate migration.
Speaker:Um, each migration should have just a, a single destination type, so
Speaker:you'll want separate migrations in the same project, one, one for each.
Speaker:type of entity you're creating.
Speaker:Um, the configuration entities are often settings.
Speaker:Um, but, uh, but blocks, for example, each block is a configuration entity,
Speaker:and you could have one, one migration in your project to create block entities.
Speaker:Um, But there, uh, there are other things, those are the most common ones,
Speaker:but, um, the entire migration system is pluggable, the, um, in the load phase
Speaker:we have destination plugins, and there are alternatives to, um, to entities,
Speaker:you might be creating a custom database table, um, and there's, uh, I think one
Speaker:example we'll be talking about later where we're actually migrating into
Speaker:the Drupal state system, which, uh, if you drill down a little bit, turns out
Speaker:to be the key value table in Drupal.
Speaker:One would think that, you know, Drupal migrates will only need one destination
Speaker:for its migrations, which is Drupal.
Speaker:Um, but Mike, you, when you were designing Migrate, you decided.
Speaker:To make it pluggable in general, but also make destinations pluggable.
Speaker:Um, can you talk about the reasoning, like why, why you left the door open
Speaker:to, to run migrations that, that store data into anything, basically?
Speaker:Well, um, well, the main thing was that at the time we originally developed Migrate,
Speaker:which was Drupal 6, uh, basically every module that managed content in Drupal had
Speaker:its own database table, its own schema.
Speaker:There was no general purpose entity system.
Speaker:So basically each type of data in Drupal needed its own, uh, destination plugin.
Speaker:Um, and.
Speaker:So, um, it was just natural to use the same sort of plugin system
Speaker:as we're using for the extractor.
Speaker:Um, and of course, once you've got that flexibility, you can start
Speaker:to think of other ways to use it.
Speaker:For example, you can have a CSV, uh, destination plugin to export
Speaker:data using the migration system.
Speaker:If you want to pull, um, pull a Drupal data out into, uh, format important
Speaker:to something else, you can write a migration that extracts your Drupal data,
Speaker:transforms it into the, um, proper, uh, format and then loader that dumps it.
Speaker:And even, even inside Drupal now we have, as far as I know, different classes
Speaker:for different entity types, right?
Speaker:Like different destination classes.
Speaker:For noise for taxonomy terms.
Speaker:All right.
Speaker:There there is a, you know, they're built on a general, um, into the
Speaker:destination, but many, um, there is, of course, a big difference between
Speaker:content entities and configuration entities, but also even among those.
Speaker:A number of the, um, of entity types need a little bit of special handling.
Speaker:Uh, for example, users, you have to deal with, uh, passwords.
Speaker:So the person writing the migration doesn't necessarily know whether
Speaker:there's any special processing.
Speaker:They know they're creating...
Speaker:Entity types of type user.
Speaker:So they say content entity, colon user.
Speaker:And if there is special processing, there'll be a special, um, load
Speaker:class to, to manage that, and if not, it'll just fall back to the
Speaker:generic content entity of type user.
Speaker:Right.
Speaker:And, um, we, we haven't touched on that so far, but perhaps we should that most
Speaker:of your migration logic as such as it is will be implemented in simple YAML files,
Speaker:basically migrations are configuration.
Speaker:Um, if the existing plugins serve the needs for your particular, uh, migration.
Speaker:Application, then basically you just write a bunch of YAML and made in
Speaker:serious migrations, you often need to write an occasional, um, transformer
Speaker:in PHP, but most of your work is just YAML, so it's very readable.
Speaker:Very simple to put together.
Speaker:That isn't really part of the load talk.
Speaker:That should, but we should cover it somewhere.
Speaker:Um, when we were preparing for this episode, you also mentioned that, um, like
Speaker:the fact that we are, when we are storing entities, we do one entity at a time,
Speaker:and there are very good reasons for that.
Speaker:Can you also talk about that a little bit?
Speaker:Well, the whole pipeline, the, uh, handles one entity or one
Speaker:logical piece of data at a time.
Speaker:And, um.
Speaker:There are multiple reasons for that.
Speaker:One big one is to handle references between, uh, entities, let's say, if you
Speaker:have a link from one node to another or a link from a node to a taxonomy term.
Speaker:And historically these, um, the unique identifiers for Drupal entities have been
Speaker:Serial, uh, fields, um, serial numbers.
Speaker:And when you're creating new entities on your new system, it's, you're,
Speaker:you're going to end up with new numbers.
Speaker:If you really, really insist on it and work hard at it.
Speaker:You may be able to preserve your IDs, but this is, it's not recommended.
Speaker:It's much simpler to, um, rewrite the references and to rewrite the references.
Speaker:You can't do it in bulk because you won't know the new reference number.
Speaker:The pipeline does it one at a time so that you migrate one entity, it gets its
Speaker:new number, we keep track of the mapping from its old ID to its new ID, and then
Speaker:when it's time to migrate the reference.
Speaker:We can fill in the new ID and everything is still pointing
Speaker:where it's supposed to be.
Speaker:Um, you know, we will talk, we talked more about this in the
Speaker:Transform talk before this.
Speaker:And another, I'm sorry.
Speaker:Yeah, go ahead Benji.
Speaker:Yeah, another reason to do it one entity at a time is that we want to leverage
Speaker:the other APIs that Drupal provides.
Speaker:The Entity API does not give us a method for creating 10 entities at a time.
Speaker:It gives us methods for creating one entity at a time.
Speaker:So just for to make, to manage the complexity of the Migrate API in
Speaker:Drupal core, that's a second reason for, for doing it one at a time.
Speaker:Now, in particular cases, um, if you've got a huge number of things
Speaker:that you're creating and you're, uh, you know, that your migration
Speaker:is going to take hours or days.
Speaker:Um, and you know that this particular part of your project isn't going to
Speaker:require the sort of references that Mike was talking about, then on a particular
Speaker:project, it might make sense to have, um, some, some custom code, a custom
Speaker:destination plugin, for example, that does batch things to 10 or 100 at a time.
Speaker:Um, but that's not going to go into Drupal core.
Speaker:Because it won't always work and it would be a lot of added complexity.
Speaker:And, uh, one other reason to deal with one entity at a time is, uh, memory.
Speaker:If you've got a lot of data, you don't want to deal with the whole batch of data.
Speaker:At once, and we are, uh, the, the migrate system is very performance conscious.
Speaker:It's got some built in memory, um, uh, sort of, uh, I'm not sure what you
Speaker:would call it, but it, it would, it will recognize if you're running low on memory.
Speaker:And, um, do some purging of internal caches and so on as needed to keep going.
Speaker:And if you're using, um, Drush to run your migrations as you should, it
Speaker:can, if necessary, um, respawn a new process, fresh process, if the, uh, if
Speaker:it's unable to reclaim enough memory.
Speaker:So Migrate will do that automatically, I mean Drush and Migrate will do
Speaker:that automatically behind the scenes without developer initiating it?
Speaker:Yes.
Speaker:That's great.
Speaker:We are planning to have a talk on performance and I'm sure that we will
Speaker:talk about these sorts of things, uh, in detail in that episode.
Speaker:Um, Benji already mentioned core versus contrib.
Speaker:So what, what do we have in core?
Speaker:In terms of, uh, the load step and which interesting other things
Speaker:could we find in contrib space?
Speaker:So Core has, uh, support for migrating from Drupal 6 or Drupal 7 into modern
Speaker:Drupal, and so mostly that means entities.
Speaker:So nodes, taxonomy terms, users.
Speaker:Blocks, um,
Speaker:and, and then, um, in contrib space, um, the sort of the, the most esoteric
Speaker:example I know of is a module called Commerce QuickBook WebConnect, which
Speaker:uses SOAP to, um, import data from QuickBooks into Drupal and to export
Speaker:data from Drupal to QuickBooks.
Speaker:And Lucas, Hedding is one of the maintainers of that module
Speaker:and it's, it's, it's lightly used and I don't think there's a
Speaker:Drupal 10 compatible version yet.
Speaker:Um, but I used it on a recent project.
Speaker:And, and looked at it and it's, uh, it's very interesting in the way it
Speaker:uses migrate to export data from Drupal.
Speaker:And I, I think Mike, uh, suggested how this works earlier, but it, it goes
Speaker:through the, um, the orders one at a time and the, the load plugin it,
Speaker:it uses, or, or the, the destination plugin it uses for the load stage,
Speaker:um, exports data about a commerce order into the Drupal State system.
Speaker:Um, and then, um, a, it, it, it cuts off the migration after processing one row
Speaker:and then another part of the module takes over and extracts the data from the state
Speaker:system and generates a soap response.
Speaker:which then gets batched somehow.
Speaker:So, um, so as Mike said, you, you, you could be exporting
Speaker:to a CSV file or something.
Speaker:In this case, we're exporting to the state system.
Speaker:And then other parts of the module use that to get the data into QuickBooks.
Speaker:Um, less esoteric than that.
Speaker:Um, the only other sort of general purpose, uh, destination plugin I
Speaker:know is in the Migrate Plus module.
Speaker:There's, uh, an explicit, uh, destination plugin for a custom SQL table.
Speaker:So if you have custom database tables in your project that you
Speaker:need to migrate, um, you can use the, uh, SQL table plugin from.
Speaker:Uh, migrate plus and that that doesn't use the entity system.
Speaker:It just writes directly to the SQL table.
Speaker:Um, I want to go back to the QuickBooks a little, a little bit, because I find
Speaker:this approach of migrating into state system and then doing SOAP requests.
Speaker:Um, very interesting.
Speaker:Do you, do you know why it was designed this way or should we get Lucas on,
Speaker:on team talks to explain us that?
Speaker:So I, I've never asked him about it.
Speaker:Um, and he, he wasn't the original author of the module, but, uh,
Speaker:but he, he did some work on it.
Speaker:Um, but I'm, I'm pretty sure that the, uh, the reason they decided to
Speaker:use the migrate API for that is that it gives a way of tracking, um, the
Speaker:original entity ID and the exported ID.
Speaker:So Drupal has, um, as Mike mentioned, sequential IDs for each order.
Speaker:QuickBooks has its own way of keeping track of the orders.
Speaker:And, uh, the Migrate API provides a system for keeping track of which Drupal
Speaker:ID corresponds to which QuickBooks ID.
Speaker:And that's, uh, that's one, one of the, um, reasons for using, um, the
Speaker:Migrate API if you do need to keep track of, uh, of old and new IDs.
Speaker:Then that that's one argument for using migrate API rather than just some sort
Speaker:of custom code for, for exporting data.
Speaker:That's a great point.
Speaker:I didn't think about it.
Speaker:Um, so I remember when, uh, we were still like the Drupal community
Speaker:was still developing Drupal 8.
Speaker:Um, it's been.
Speaker:You know, quite a long process and, um, also the discussion about including
Speaker:migrating to core, um, happened at that time and then eventually the decision
Speaker:that we will use it to migrate from Drupal 7 to 8, um, I remembered that.
Speaker:Um, back in those days, MongoDB, like the company behind the Mongo database, um,
Speaker:wanted to be like the first class citizen for Drupal, like providing out of the box.
Speaker:Support to run your, um, your Drupal site on Mongo instead of MySQL.
Speaker:And I've been involved with Mongo quite a lot at that time, because I
Speaker:was working at Examiner and Examiner was using, uh, MongoDB for Drupal 7.
Speaker:But in Drupal 7, you still had to use, uh, MySQL database.
Speaker:Next to it.
Speaker:So you had two databases, two sets of, uh, of backups and all that.
Speaker:Um, so they wanted to provide the, the ability to, to be Mongo did the sole
Speaker:database for Drupal 8 and, and on.
Speaker:And I remember that, uh, Chx, CHX was working on that.
Speaker:Um, and he was really excited about Migrate because he realized that
Speaker:if we would be using Migrate as a standard to migrate from D7 to D8.
Speaker:You would basically just swap the destination plugin and instead of loading
Speaker:into MySQL, you would load in MongoDB.
Speaker:Um, but then, then MongoDB company lost interest and, and, and stopped
Speaker:funding Chx to do that work.
Speaker:So that work was never completed.
Speaker:Um, it was like in a very early alpha stage.
Speaker:And the module is, is still on D.o.
Speaker:And I'm not sure what state is it at the moment, but, um, A, it would
Speaker:have been very cool to have this possibility and, um, B again, proves
Speaker:that, um, having the Load part of the Migrate pluggable is very useful.
Speaker:Uh, do you two have any other, like.
Speaker:Unusual or interesting cases related to Lpart, uh, that you've seen in the past
Speaker:or maybe any ideas how it could be used, but you've not seen it used that way yet.
Speaker:I almost always create entities.
Speaker:I don't think I have any other examples of clever uses of the Load stage.
Speaker:It is a little exotic, you know, beyond.
Speaker:You know, if you want to export some CSVs for some reason,
Speaker:Yeah, it could be used for exporting, like similar to how WordPress exports
Speaker:or precise and XML you could use.
Speaker:Yeah.
Speaker:Although views export is probably easier for most of those cases.
Speaker:That's true.
Speaker:So I think that that's it for the L part.
Speaker:Um, this is also the end of the last, uh, episode in our ETL mini series.
Speaker:Uh, but we have some great team talks lined up.
Speaker:Uh, our goal is to put out one per week over the next few months to
Speaker:support the community in the migration process from Drupal 7 to Drupal 10.
Speaker:Um, and as part of that, we're planning to talk about performance, which is
Speaker:something we care deeply about at Tag1.
Speaker:Um, and of course it applies to migrations as well, especially if
Speaker:you're handling really large data sets.
Speaker:Um, a full data migration can easily take over 12 hours or even more days.
Speaker:Um, and we'll do a handful of talks on this topic, including how
Speaker:to profile and tune a migration.
Speaker:we'll also do a talk on incremental migrations.
Speaker:Where you can include or exclude things, uh, and run a migrational subset
Speaker:of data to make it perform better.
Speaker:And every project owner wants their migration to be a success.
Speaker:We will dedicate an episode to discuss the most important factors for a
Speaker:successful Drupal 7 to 10 migration in order to help successfully
Speaker:navigate your migration project.
Speaker:And other topics that we are planning to cover include porting custom code
Speaker:from Drupal 7 to Drupal 10, uh, the future of migrate tooling, how to
Speaker:port the team and, uh, so much more.
Speaker:We, we hope that you'll tune in and enjoy our upcoming team talks.
Speaker:A huge thank you to the Tag1 Team.
Speaker:Thank you, Benji
Speaker:Fisher
Speaker:and Mike Ryan.
Speaker:Um, make sure that you check out the other segments in this series.
Speaker:There will be links to them in the show notes, along with all the
Speaker:other links that we mentioned today.
Speaker:If you like this talk, please remember to upvote, subscribe, and share it.
Speaker:Uh, you can check our past talks at tag1..com/ttt.
Speaker:That's three Ts for Tag1 Team Talks.
Speaker:As always, we'd love to hear your feedback and any topic suggestions.
Speaker:You can write us at TTT@ Tag1.Com.
Speaker:A big thank you to both of our guests and to everyone who tuned in.
Speaker:Thank you for joining us.