Artwork for podcast Tag1 Team Talks | The Tag1 Consulting Podcast
Unraveling the ETL Data Migration Process - Understanding Transform - Tag1 Team Talk
Episode 1109th January 2024 • Tag1 Team Talks | The Tag1 Consulting Podcast • Tag1 Consulting, Inc.
00:00:00 00:38:28

Share Episode

Shownotes

Navigating the complex waters of Drupal migration can be daunting, but the latest episode of Tag1 Team Talks demystifies the process, offering invaluable insights for developers and IT professionals. Drupal experts from Tag1 Consulting, including Mike Ryan and Benji Fisher, delve into the transform transformation phase of the ETL (Extract, Transform, Load) process. They compare the unique, row-by-row approach of Drupal's Migrate system with traditional bulk processing, highlighting the flexibility and customization it offers for various data sources.

This podcast series is a deep dive into the mechanics and challenges of Drupal migration, and this episode addresses the crucial role of the transform transformation phase. Here, data is reshaped and prepared for its new home in Drupal. The experts discuss performance considerations, stressing the impact of transformation efficiency on the overall migration timeline. They also explore Drupal-specific nuances, offering practical tips and strategies for a smooth transition.

Transcripts

Speaker:

Welcome to Tag1 Team Talks, brought to you by the Tag1 Consulting.

Speaker:

With Drupal 7 rapidly approaching and Drupal 9 already end of life, we are

Speaker:

hearing people talk about migrating and upgrading more than ever before.

Speaker:

And anyone who's ever been involved with a large scale migration, Migrating

Speaker:

a large site or application from one technology stack to another will

Speaker:

tell you that it's complex, time consuming, and it demands expertise.

Speaker:

That's why we're bringing you this series of talks.

Speaker:

Diving deep into the world of Drupal migrations.

Speaker:

And who better to guide us than Tag1's very own Drupal migration experts.

Speaker:

From the masterminds and maintainers of Drupal's migration tooling to the

Speaker:

individuals behind the most groundbreaking Drupal migrations, we've got an all

Speaker:

star lineup who'll cover everything you need to know about every aspect

Speaker:

of migrating large scale applications.

Speaker:

This team talk is part of the three part series about ETL, extract,

Speaker:

transform, and load process, which is used by many enterprise migration

Speaker:

systems, Drupal's Migrate included.

Speaker:

In today's episode, we're going to talk about how to use Drupal's Migrate

Speaker:

system to transform the data before loading it into the Drupal's database.

Speaker:

Be sure to stick around to the end because we are going to announce

Speaker:

the next few talks in our series.

Speaker:

Let's dive in.

Speaker:

I'm Janez Urevc , senior engineer here at Tag1, and a

Speaker:

longtime contributor to Drupal.

Speaker:

I'm joined today by, well-known top contributors to Drupal, Benji

Speaker:

Fisher, one of the five current Drupal Migrate core system maintainers.

Speaker:

And Mike Ryan, co-creator of Migrate.

Speaker:

Welcome.

Speaker:

Thank you both for joining me.

Speaker:

Thanks for having us.

Speaker:

We're glad to have you.

Speaker:

Before we dive in, I would just like to mention that in case you didn't

Speaker:

already watch or listen to the previous episode in this series about

Speaker:

E, Extract, we'd suggest that you do

Speaker:

in that episode, we, among other things, provided a high level overview

Speaker:

of what ETL stands for, so we'll not repeat that in this episode.

Speaker:

Now, finally, let's dive into today's topic, which is T Transform.

Speaker:

Mike, could you tell us what is being done as part of the transform phase

Speaker:

in general and how Drupal does it?

Speaker:

Is it like similar to how other enterprise systems do it or are

Speaker:

there any specialties to it?

Speaker:

One difference from your classic ETL is the classic ETL usually goes in bulk.

Speaker:

You extract all your data into a big blob.

Speaker:

Then you run it through a transformer, which transforms everything.

Speaker:

And then you run it into a loader, which does a bulk load.

Speaker:

ouR approach is to run through the data one logical row at a time.

Speaker:

We say row because most often we're dealing with databases as our sources.

Speaker:

But.

Speaker:

Technically, it could be anything like any form like a web service or CSV.

Speaker:

Basically, we Use the Drupal plugin system.

Speaker:

And you can, for a given pipeline, each field that is being run through

Speaker:

the pipeline um, can go through any number of transformers because

Speaker:

they're plugins, it's very flexible.

Speaker:

There's a YAML format you can use to write your migrations, which

Speaker:

specifies for each field, what plugins it's going to transform with

Speaker:

and whatever configuration you add.

Speaker:

So it takes the output of the source plugin and all the source plugins,

Speaker:

regardless of the source CSV, et cetera.

Speaker:

produce a common data structure, which feeds into the transformer.

Speaker:

And the transform pipeline will take one row from that.

Speaker:

And it will go through each piece of the process.

Speaker:

The transform step, uh, we call it process in Drupal and apply the transformers.

Speaker:

And each transformer can take actually multiple fields from the

Speaker:

source row, or it can take none.

Speaker:

You might use a processor that simply sets a constant value.

Speaker:

And the transformers can be very flexible and for the most part,

Speaker:

they're not very Drupal dependent.

Speaker:

You'll do a lot of string manipulation, for example.

Speaker:

Let's see, I'm not sure what else there is to say about the general process.

Speaker:

But

Speaker:

One thing I'd like to add at this point is that sometimes we cheat.

Speaker:

We don't strictly follow the ETL paradigm.

Speaker:

But we take a peek at the database.

Speaker:

So for example, we might look for an existing taxonomy term that

Speaker:

has the name dessert, or we might check how the editor is configured.

Speaker:

So that's one way in which it's very Drupal specific and doesn't

Speaker:

strictly follow the ETL paradigm.

Speaker:

You have complete flexibility.

Speaker:

You can do anything you want.

Speaker:

Good or bad.

Speaker:

Like always.

Speaker:

And maybe while we're mentioning bad things to do in these processors.

Speaker:

It should be noted that this pipeline is run for each source row

Speaker:

in your data, and when dealing with multiple value fields, it might run

Speaker:

several times for one source row.

Speaker:

So the processing pipeline is a key place to watch performance.

Speaker:

One, one slow processor will kill the overall migration process.

Speaker:

Makes sense.

Speaker:

Cause it could be run a lot of times and that adds up, right?

Speaker:

Oh, thousands, millions.

Speaker:

Yeah.

Speaker:

And it can take days as we discussed before.

Speaker:

So Benji, I heard you state in the past that the transform stage is the most

Speaker:

interesting part of the migration and I know for a fact that you are probably

Speaker:

the most excited about it in the whole.

Speaker:

ETL migrate world.

Speaker:

Why is that?

Speaker:

Yeah you're right.

Speaker:

And this is something that I decided when I first started working for migrations.

Speaker:

And by the way I am the most junior member of the current maintainers and

Speaker:

And I have a lot less experience than most of them, or than Mike, so I defer

Speaker:

to Mike on questions of experience and performance in large scale migrations.

Speaker:

But I do have some pretty strong opinions about the transform

Speaker:

stage, the process plugins.

Speaker:

The first reason that it's the most interesting is that any migration

Speaker:

project will be broken up into a bunch of different migrations.

Speaker:

And each one of those migrations will have a single source.

Speaker:

and a single destination.

Speaker:

But any one migration has many fields.

Speaker:

So if you have a migration for your article nodes, it'll have a body field.

Speaker:

It'll have a couple of timestamps.

Speaker:

It might have taxonomy and images and so on.

Speaker:

Each one of those fields is going to have at least one process plugin.

Speaker:

One transformer, as Mike described them, and some fields will

Speaker:

have several transformations.

Speaker:

iN that sense, it's it's where the most variety is.

Speaker:

One, each migration again has one source plugin, one destination plugin,

Speaker:

but can have many transformation plugins or process plugins.

Speaker:

The second thing is that the transform stage, the process

Speaker:

plugins are where you have the most opportunity for reasoning your code.

Speaker:

So if you look at the source plugin, it has to understand whatever cruft

Speaker:

is involved in your source data the site you're migrating from.

Speaker:

And the only time you're going to be able to reuse a source plugin is

Speaker:

if you have the same type of source.

Speaker:

So once you've written a source plugin for a WordPress XML file, you can reuse that.

Speaker:

And once you've written a source plugin for Drupal 6 or

Speaker:

Drupal 7, you can reuse that.

Speaker:

The destination plugin, almost always migrating into Drupal entities.

Speaker:

They could be taxonomy terms or nodes menu links are entities,

Speaker:

um, and the core migration system already understands the destination.

Speaker:

So that's already done.

Speaker:

bUt getting from one to the other is in my opinion, the interesting

Speaker:

part and the part that has the most opportunity for reusing code.

Speaker:

So that's why I think that the transform stage is the most interesting.

Speaker:

Yeah it's, it, for most migrations, you'll find that the source and the the extract

Speaker:

and the load phases you simply need to use

Speaker:

core plugins and some configuration.

Speaker:

You don't usually need to do very much PHP coding.

Speaker:

It's the process plugins where you're most likely to need to write your own

Speaker:

plugins, write your own application logic, because that's where,

Speaker:

you're transmogrifying your data.

Speaker:

You can do the new system.

Speaker:

Although there are some people who prefer to do it all in the source plugin.

Speaker:

They'll just write all their custom PHP there and prepare everything

Speaker:

so that it's ready to be imported.

Speaker:

And again, I don't like that approach because it, you can't reuse

Speaker:

the code if you do it that way.

Speaker:

Yeah, it makes it way harder to reuse it.

Speaker:

It's also against the ETL paradigm, because then you're...

Speaker:

Basically throwing away this separation of different phases that

Speaker:

we're trying to introduce here.

Speaker:

So to be a little bit more concrete, what would be the most common

Speaker:

transform operations in a migration?

Speaker:

Like what would we do in transform process plugins?

Speaker:

Yeah, so by far the most common is just a straight copy.

Speaker:

You have a text field, and you pass it over to the new text field,

Speaker:

which often has the same field name.

Speaker:

Sometimes you decide to change that as part of your site redesign.

Speaker:

That's the most common.

Speaker:

And, that's almost not using a transform plugin at all.

Speaker:

It's technically using the get plugin, but it's not doing any transformation.

Speaker:

Another common thing is that your source has a comma separated list of values,

Speaker:

and you split that into pieces, and you convert each word into a taxonomy term ID.

Speaker:

sO that's something that comes up pretty commonly.

Speaker:

aNother really important one is since Drupal deals with structured

Speaker:

data, you might have references to other nodes, other taxonomy terms

Speaker:

identified by their entity IDs.

Speaker:

And if those entity IDs are changing as they often do in a complex

Speaker:

migration, then you have to translate the old entity id, the ID on the

Speaker:

source system to the new entity id.

Speaker:

And that's possible because the migration system keeps track

Speaker:

of the old and new entity IDs.

Speaker:

So that, that's a really important one.

Speaker:

Some other things you might wanna do is, make your site

Speaker:

better as you're transforming it.

Speaker:

So if you see that people are consistently using CSS classes, font dash bold, size

Speaker:

dash large, color dash red well, you can replace that with my theme dash warning.

Speaker:

And suddenly your CSS markup is a lot more semantic and a lot

Speaker:

easier to maintain in the long run.

Speaker:

Another common one is to convert date formats, like maybe they're in year

Speaker:

month day format, and you want to convert it to a timestamp or vice versa.

Speaker:

And then there are a whole bunch of utility operations.

Speaker:

And you wouldn't think of these as the things you want to do to your data, but

Speaker:

they're the things that end up getting used, in the middle of the process.

Speaker:

So flatten an array, combine several arrays into one, filter out empty

Speaker:

values, or apply a callback function.

Speaker:

So those I think are the most commonly used process plugins.

Speaker:

Mike, am I leaving anything out?

Speaker:

Yeah, I think that those are the key ones.

Speaker:

I'm looking now at the list of all the ones that are in core and

Speaker:

maybe we might want to highlight a few other interesting ones.

Speaker:

here?

Speaker:

While you're looking at it, I just wanted to comment.

Speaker:

The callback one is an interesting one because it almost

Speaker:

lets you cheat a little bit.

Speaker:

If you're, if you need to introduce your custom logic, but you don't want to

Speaker:

create a plugin and go through all that.

Speaker:

You can always use a callback existing callback function process plugin,

Speaker:

and then just create a function in PHP, which will be called.

Speaker:

Or use a basic PHP function.

Speaker:

Or use a basic PHP function.

Speaker:

yOu don't need to wrap trim in a plugin.

Speaker:

You simply use callback, specify trim, it's the callback, and boom, you got it.

Speaker:

Maybe this is a good time to mention that our show notes include some

Speaker:

links to the documentation where we list all of the plugins that are in

Speaker:

core, and those will be available on our pages after we publish this talk.

Speaker:

ONe of the interesting ones is static map.

Speaker:

This is...

Speaker:

It basically, it's like translating enums that is if the source field contains a

Speaker:

finite list of distinct strings and those need to be different on the Drupal side,

Speaker:

you use a static map plugin, which says change this string to that string, and

Speaker:

that's a very handy in a lot of cases.

Speaker:

Or if you're dealing with NFL team names and the Redskins are now the

Speaker:

commanders, you can say that this finite list of names has changed and

Speaker:

anything else you pass through unchanged.

Speaker:

Yeah.

Speaker:

And the Cleveland Guardians in baseball.

Speaker:

I'm seeing sub process, and that is one of the more complicated ones that allows

Speaker:

you to do some really complicated things.

Speaker:

When a field consists of a list, an array, and it allows you to basically

Speaker:

have a sub process pipeline for the pieces of this source field.

Speaker:

And this is very complicated and technical.

Speaker:

I'm not going to go through it right now because...

Speaker:

I always have to relearn it

Speaker:

when I need to use it.

Speaker:

Will a subprocess use the same set of plugins as the main migration?

Speaker:

With source and transform and all those things?

Speaker:

Oh, no.

Speaker:

The source, rather than being a row from your source plugin, At

Speaker:

the source is the contents of the field, the extracted field.

Speaker:

So it's used on fields, which themselves have structure.

Speaker:

But it does use or have access to all the same process plugins that

Speaker:

the general transform stage has,

Speaker:

Which obviously is immensely powerful then..

Speaker:

It is.

Speaker:

It is.

Speaker:

In theory, instead of.

Speaker:

Migrating your taxonomy terms that's a bad example because we've got a

Speaker:

shortcut for that, but user accounts, in theory, instead of doing them

Speaker:

in a separate migration from your main content migration, you could

Speaker:

do them dynamically within a sub process within your content migration.

Speaker:

We do not recommend that, like I said, that the process plugins are very

Speaker:

powerful and you can cut yourself.

Speaker:

Um, there, there is a plugin for copying files, which, if you're going

Speaker:

from one system to another, an old version of Drupal to a new one uh,

Speaker:

you want your images, your videos, your documents to come across too.

Speaker:

And the file copy plugin is very flexible because it gives you.

Speaker:

feW different options for doing that and for doing that performantly.

Speaker:

For example, it could simply copy it into Drupal's public files directory.

Speaker:

And it can keep track.

Speaker:

You can set a flag on it.

Speaker:

So that if the file already exists at the destination, you don't overwrite it.

Speaker:

And that's great for your performance when you're rerunning migrations,

Speaker:

especially during development.

Speaker:

You can also,

Speaker:

now I'm trying to remember the other, but of course you could use it to copy

Speaker:

directly into the files directory.

Speaker:

You could copy it to an S3 bucket.

Speaker:

Or some other, storage service, uh,

Speaker:

Or I think that I had a use case in the past where we needed to copy files from

Speaker:

like another website into our local file system as part of the migration.

Speaker:

And I think that file copy was used for that as well, which

Speaker:

is obviously terribly slow.

Speaker:

And a little tidbit is that the file copy relies on the download

Speaker:

plugin if the source is remote.

Speaker:

And that uses Guzzle in a way that's slightly different from anywhere else

Speaker:

it's used in Drupal Core and caused some interesting test failures years ago.

Speaker:

Yes, you have, sometimes you have to be clever to make things

Speaker:

work and work performantly.

Speaker:

Yes.

Speaker:

I also remember...

Speaker:

During one of the migrations I was working on, we used file copy to copy

Speaker:

files straight from NFS probably to another public sites folder, public files

Speaker:

folder and we were actually copying files and that slowed the migration a lot.

Speaker:

And then we figure out that it's better to rsync before running the migration

Speaker:

and then this, check if the file exists kicks in and you don't need to

Speaker:

copy, you just find it there and that sped up the migration significantly.

Speaker:

But we're getting into performance considerations now, which is

Speaker:

another talk in the future we will be doing, what about contrib?

Speaker:

What kind of interesting process plugins can we find in contrib

Speaker:

that are not part of core?

Speaker:

Oh, so many.

Speaker:

So many, I need to jog my memory here and take a look.

Speaker:

So the Migrate Plus module is an add on to the core migration system, and

Speaker:

it has a number of interesting ones.

Speaker:

There are

Speaker:

several several plugins for manipulating and and scanning

Speaker:

a DOM document object model.

Speaker:

So you can scan your HTML or XML and find, extract, the span with the

Speaker:

TextBold class that's in the, underneath a P, for example, if you need to

Speaker:

manipulate that piece of your content.

Speaker:

There is entity lookup entity generate that makes it easy

Speaker:

to find a matching entity.

Speaker:

That's not one necessarily one that you migrated and what you can find via

Speaker:

the map tables that migration provides.

Speaker:

But if you're migrating in the system, you've got a, maybe a

Speaker:

taxonomy there you want to hook up to.

Speaker:

You can use entity lookup.

Speaker:

To find a matching term in that vocabulary and link to it.

Speaker:

And you can also use entity generate, which does the same thing, but also

Speaker:

if it doesn't find the matching term would create it for you.

Speaker:

lEt's see, there's file blob.

Speaker:

If you've got file data in a database blob, you can convert

Speaker:

that to a real file with that.

Speaker:

File blob reminded me of tHe beginning of my career, which predates

Speaker:

Drupal, where I had experience with proprietary CMS that was really into

Speaker:

storing all files in the database.

Speaker:

That was fun.

Speaker:

Yeah.

Speaker:

So those, and those are the ones that pop out to me immediately.

Speaker:

besides migrate plus, which is a grab bag of several different, uh, plugins there

Speaker:

are several other contributed modules.

Speaker:

That have plugins of all sorts.

Speaker:

And before you go writing your own plugins, take a look through the contrib

Speaker:

modules that are available on drupal.

Speaker:

org and you might find someone that's already solved your problem.

Speaker:

Maybe they've got a SOAP plugin.

Speaker:

Actually, I know they do because I wrote it, but whatever your scenario

Speaker:

is, assume you are not that unique.

Speaker:

Until you prove you are.

Speaker:

And Migrate has been around for years, like probably more

Speaker:

than a decade at this point.

Speaker:

And it's migrated many enterprise.

Speaker:

Large scale applications.

Speaker:

So I'd be almost convinced that if there is a use case, it has

Speaker:

probably already been done.

Speaker:

Yeah, it started out as a contributed module.

Speaker:

Mike and Moshe Weitzman developed it in Drupal 5 or Drupal 6?

Speaker:

6.

Speaker:

Okay.

Speaker:

It was 6.

Speaker:

I think we may have started trying it for 5 and we jumped ahead to 6 because 6 was

Speaker:

It's more conducive to what we were doing.

Speaker:

Yeah.

Speaker:

My first experience with it was the Drupal 7 version.

Speaker:

Yeah.

Speaker:

The first big project the first client you would know was the Economist.

Speaker:

Economist.

Speaker:

com way back in the day.

Speaker:

Which I think also sponsored a lot of initial Migrate module work, right?

Speaker:

Yes.

Speaker:

Yes.

Speaker:

After Economist, it was examiner.

Speaker:

com, which all of them are sponsored.

Speaker:

A lot of D7 work, right?

Speaker:

Yes, they sponsored most of our Port D7.

Speaker:

Martha Stewart Living was about that time too.

Speaker:

Speaking about history, what, do you have any anecdotes or any interesting

Speaker:

or unusual process related use cases that you've experienced in the past?

Speaker:

Oh, boy, they're all jumbled together.

Speaker:

One I don't want to remember is the time her client thought we had created

Speaker:

a major security breach because uh, our development migration suddenly started

Speaker:

sending out emails to all their customers.

Speaker:

And this was the migration, and this is something to watch out for.

Speaker:

The migration system actually explicitly disables, uh, the mail system while

Speaker:

running uh, which we thought was safe, but what happened was a module was enabled,

Speaker:

which during entity creation, which happens during migration queues emails

Speaker:

to be sent and it was fine for a while because this was a development system and

Speaker:

that no one saw but a one little ping on port 88 to that system caused Cron to run

Speaker:

it was using the lazy cron or whatever you call it boom Those, yeah, so those emails

Speaker:

started going out and caused quite a stir.

Speaker:

So yes, you this is something you do need to be careful about is um,

Speaker:

effects, whatever effects the ultimate website might have beyond itself.

Speaker:

Be careful that you control them within your development and testing system.

Speaker:

Which is good advice in general, not just for.

Speaker:

Yes.

Speaker:

And this is where I find DDEV, which is one of the projects that we are

Speaker:

really excited about at Tag1 really useful because I believe that DDEV will

Speaker:

reconfigure your development environment to redirect, uh, emails into like this.

Speaker:

MailHog.

Speaker:

MailHog.

Speaker:

It basically redirects everything in there and it stays just in memory even, I think,

Speaker:

so if you are inside DDEV with regards to mails, you can be pretty sure that

Speaker:

no matter what's going on, you're safe.

Speaker:

And it's handy too for testing your outgoing emails, testing

Speaker:

the formatting or whatever.

Speaker:

Yeah, exactly.

Speaker:

That's probably the usual use case when it was created.

Speaker:

Yeah.

Speaker:

But it, you'll see as a side effect, it also provides a layer

Speaker:

of security and peace of mind.

Speaker:

I, there, there are two points here.

Speaker:

I want to make sure we don't lose track of them.

Speaker:

The first is that while you're developing, no emails will be sent out.

Speaker:

But the second one is equally important.

Speaker:

You have to look at the emails that did get captured by MailHog

Speaker:

because those are the ones that will be sent out in real life when

Speaker:

you're on production and not local.

Speaker:

Yeah, make sure your tokens are being substituted, all that stuff.

Speaker:

Yes.

Speaker:

Benji, what about you?

Speaker:

Do you have any interesting stories from the past?

Speaker:

Yeah.

Speaker:

And, I'm really flattered that when Mike was going through the list of plugins in

Speaker:

Migrate Plus the first ones he singled out were the DOM processing plugins,

Speaker:

because that was one of my contributions.

Speaker:

And And let me call out, this was on a project for Pega Systems, and I was

Speaker:

working for Isovera at the time with Marco Villegas, and we developed the

Speaker:

first DOM processing plugins, and both Isovera and Pega were supportive of

Speaker:

contributing that back to Migrate Plus And I guess the original problem I

Speaker:

was trying to solve is that as I said earlier the node IDs were changing.

Speaker:

So if we have separate entity reference fields, Drupal could already handle that

Speaker:

and just using the migrate lookup plugin.

Speaker:

And you could say that the next article used to be node one, two,

Speaker:

three in the migrated system.

Speaker:

It's node four, five, six.

Speaker:

You can do that translation, but what if you have a text field and

Speaker:

inside that text field, there's an anchor link and the anchor href,

Speaker:

goes to node slash one two three.

Speaker:

How do you translate that to node four five six?

Speaker:

And part of the answer was to use proper DOM processing.

Speaker:

And so I realized I guess I, I had the idea on an earlier project

Speaker:

and I didn't get to make it happen until it was actually needed here.

Speaker:

Everyone knows that you shouldn't be processing HTML with regular expressions.

Speaker:

But people do it anyway.

Speaker:

Yes.

Speaker:

People do it anyway because it's the tool they know it's convenient.

Speaker:

And so the first step was to introduce some process plugins to make it

Speaker:

easy to do proper DOM processing so that you have less overhead in

Speaker:

creating that DOM document object and the XPath object and so forth.

Speaker:

Once you've eliminated the overhead, it is often both simpler and more reliable to

Speaker:

do the proper HTML processing rather than to do things with regular expressions.

Speaker:

In fact, if you look at the search API module, um, I just ran into a case where

Speaker:

it's not only simpler and more reliable, it's also more performant to do processing

Speaker:

and there's there's an open issue on the search API module that that handles that.

Speaker:

So anyway that was my original purpose for putting the DOM processing

Speaker:

plugins in on that project with Pega.

Speaker:

Since then, um, some other people have done work on that.

Speaker:

There's a contrib module that builds on the DOM processing plugins, and it

Speaker:

handles if you have the media module on your Drupal 7 module, on your Drupal 7

Speaker:

site rather, and you want to migrate that to the core media module in Drupal 10.

Speaker:

It understands the tokens that the Drupal 7 media module used and and

Speaker:

handles transforming your text fields.

Speaker:

I had a really complicated project where we not only had to migrate

Speaker:

the site Um, from Drupal 7 to Drupal 8 at that point, I think it was.

Speaker:

We also had to import some really complicated XML documents into the site.

Speaker:

And that project gave me a real appreciation for the expressive

Speaker:

power of XPath, because that was the only way to manage these

Speaker:

really complicated XML structures.

Speaker:

And luckily we already had the DOM processing plugins available.

Speaker:

another complicated project I had was that we had these HTML text fields and and

Speaker:

each text field had just some image tags.

Speaker:

So basic HTML markup and we wanted to download the files from those image

Speaker:

tags and save them as files and create media entities out of them and then

Speaker:

just insert the media references.

Speaker:

Into the text field.

Speaker:

I did that with a custom PHP plugin.

Speaker:

And I do want to point out that this is thing where you can

Speaker:

shoot yourself in the foot.

Speaker:

It's not following the ETL paradigm.

Speaker:

You don't have a row creating each one of those media items.

Speaker:

And it does have certain disadvantages because it breaks the ETL paradigm,

Speaker:

but it is a practical way to handle that sort of situation.

Speaker:

Another weird one I had was...

Speaker:

a Single HTML page, um, in the source site was Drupal 7 and you would think looking

Speaker:

at this page, oh, this page is a view.

Speaker:

It's listing the person content type.

Speaker:

But in fact, it was just a basic page and all the markup was

Speaker:

just there in the body field.

Speaker:

And we wanted to pick it apart and create person nodes and then

Speaker:

create a view of the person nodes.

Speaker:

And so luckily the markup was consistent.

Speaker:

It always started with an H3 tag.

Speaker:

It had a title and that was immediately followed by an image tag.

Speaker:

So there was that consistency that I could take advantage of.

Speaker:

I Extracted the title into a text field.

Speaker:

I extracted the image, created a file media entity, and then and then just

Speaker:

stripped those from the body field.

Speaker:

and let an actual view in the Drupal, I think it was Drupal 9 at that point let

Speaker:

that put the pieces back together to make something like the original source site.

Speaker:

And the last one that I noted down was again for Pega and we

Speaker:

were importing documentation from an external XML based system.

Speaker:

So this wasn't a site migration, this was a recurring migration that someone

Speaker:

was writing the documentation in this external system, and we had to import

Speaker:

it into the Drupal site and make it look like it fit the rest of the site.

Speaker:

And.

Speaker:

And that's where I did the sort of thing I talked about before, where you look for

Speaker:

some consistent pattern of CSS classes and say, okay, we're going to replace

Speaker:

that with something more semantic.

Speaker:

And um, and again this used the DOM plugins.

Speaker:

And it also peeked at the current database the destination database to

Speaker:

see how the editor module was configured so that we could pick and choose

Speaker:

the CSS classes that were that the current site editors would naturally

Speaker:

be adding through the user interface.

Speaker:

And we added those same CSS classes programmatically through the migration.

Speaker:

So those are some of the more complicated cases I've had in the transform stage.

Speaker:

Very nice.

Speaker:

I guess that this brings us to the end of today's episode, unless

Speaker:

you have anything more to add.

Speaker:

I'm good..

Speaker:

We have some great talks coming up.

Speaker:

Our goal is to put one per week over the next few months to support the

Speaker:

community in the migration process.

Speaker:

pErformance is something we care deeply about, Tag1, and we did touch

Speaker:

performance in today's episode a little bit because it applies to migrations.

Speaker:

When you're handling really large data sets, um, a full data migration

Speaker:

can take 12 hours or even days.

Speaker:

we'll do a handful of talks on this topic, including how to profile

Speaker:

and tune a migration, and a talk about incremental migrations.

Speaker:

Every project owner wants their migration to be a success.

Speaker:

So we will dedicate an episode to discuss the most important

Speaker:

factors for a successful Drupal 7 to Drupal 10 migration.

Speaker:

oTher topics includes porting custom code the future of migrate tooling,

Speaker:

how to port a theme, and so much more.

Speaker:

We hope that you'll tune in and enjoy our upcoming team talks.

Speaker:

A huge thank you to the Tag1 team, Benji Fisher and Mike Ryan.

Speaker:

Thank you for joining me.

Speaker:

Make sure that you check out the other segments in this series.

Speaker:

There will be links to them in the show notes, along with links to the

Speaker:

modules and documentation and other things that we mentioned today.

Speaker:

If you like this talk, please remember to upvote, subscribe and share it.

Speaker:

Check out our past talks at Tag1.com/ttt.

Speaker:

That's three T's for Tag1 Team Talks.

Speaker:

As always, we'd love your feedback and any topic suggestions.

Speaker:

You can always write to us at ttt@tag1.Com.

Speaker:

Again, that's three T's for Tag1 Team Talks.

Speaker:

One more time, big thank you to our guests and everybody who tuned in.

Speaker:

Thanks for joining us.

Speaker:

Thanks.

Speaker:

Bye.

Links

Chapters

Video

More from YouTube