AI and Copyright - Training AI with copyrighted material - Perspectives

Artwork for podcast Perspectives – Legal Voices on Business

AI and Copyright - Training AI with copyrighted material

Episode 23 • 8th September 2025 • Perspectives – Legal Voices on Business • Fasken

Speaker: 00:00:20

Amy Qi: Hello, I'm Amy from McGill University and with me is Jay Kerr-Wilson,

Speaker: 00:00:25

a partner in Fasken's Ottawa office and head of the firm's copyright practice.

Speaker: 00:00:30

Welcome to the first episode in our series, Perspectives, AI and Copyright:

Speaker: 00:00:34

Exploring the legal issues posed by adoption of generative artificial intelligence systems.

Speaker: 00:00:41

In today's episode, we're exploring the legal debate surrounding the use of copyright protected content to train

Speaker: 00:00:47

AI models and how courts and governments are responding to the conflicts between AI developers and content creators and

Speaker: 00:00:55

owners. For context. So we're all on the same page.

Speaker: 00:00:58

What exactly is machine learning and generative AI?

Speaker: 00:01:03

Jay Kerr-Wilson: Thanks, Amy. So artificial intelligence refers to an umbrella term for technology that's designed to perform a

Speaker: 00:01:12

human task. So, you've got a computer system or machinery that is doing something that until then,

Speaker: 00:01:22

humans had done. And, you know, we've had AI in their simplest form for a long time.

Speaker: 00:01:28

For example, stoplights and calculators are both examples of very simple AI that have been around for a long time.

Speaker: 00:01:36

Machine learning is a more complex version of artificial intelligence,

Speaker: 00:01:43

and it focuses on teaching algorithms to learn from data without being explicitly programmed.

Speaker: 00:01:52

So you set up a system that can analyze data and apply data to rules and to

Speaker: 00:02:02

learn patterns, and then to predict behaviour or future events based on those patterns that it has learned.

Speaker: 00:02:12

And really simple examples of machine learning that everyone is familiar with.

Speaker: 00:02:17

If you have a Spotify account or a Netflix account that recommends,

Speaker: 00:02:22

um, new shows for you to watch or new music for you to listen to based on your prior consumption.

Speaker: 00:02:32

So the AI has kept track of everything you've consumed on the platform,

Speaker: 00:02:38

and it says, oh, "Amy's very interested in historical drama".

Speaker: 00:02:43

And so then it will go through using that and predict other titles of,

Speaker: 00:02:49

um, of that genre that you might also enjoy.

Speaker: 00:02:54

The game changer that we've seen in the last few years is what's referred to as generative AI.

Speaker: 00:03:00

And this is the CHATGPT, for example.

Speaker: 00:03:04

What's different about generative AI is where machine learning could could apply a rule to a set of

Speaker: 00:03:13

data and then predict outcomes.

Speaker: 00:03:16

Generative AI is able to take a huge amount of data and learn the rules itself and then use

Speaker: 00:03:25

those rules to generate brand new data.

Speaker: 00:03:29

So ChatGPT, um, by reviewing huge amounts of examples of written documents,

Speaker: 00:03:39

is able to learn how documents are constructed, how humans write documents,

Speaker: 00:03:47

and then, and in fact write its own documents by applying what it's learned and creating something brand new.

Speaker: 00:03:54

So that's sort of the evolution from artificial intelligence through machine learning.

Speaker: 00:03:59

To now generative AI, which is what we're talking about today.

Speaker: 00:04:03

Amy Qi: You say that machine learning involves teaching algorithms to learn from data.

Speaker: 00:04:08

How specifically does generative AI use this data to train models?

Speaker: 00:04:12

Jay Kerr-Wilson: Word of caution I'm not a computer scientist, so this is going to be in very simple terms.

Speaker: 00:04:18

But what large language models, which are text based generative AI systems like ChatGPT are

Speaker: 00:04:25

able to do is they analyse hundreds of millions of documents,

Speaker: 00:04:32

examples of of written material, and they break that material down those documents down into

Speaker: 00:04:39

very small fragments of text, um, that are then called tokens.

Speaker: 00:04:44

And these tokens are then assigned numerical values.

Speaker: 00:04:48

What the system is able to do is it learns the relationships between tokens,

Speaker: 00:04:54

but at a huge scale. Hundreds of millions of times.

Speaker: 00:04:57

What words tend to follow these combination of words?

Speaker: 00:05:00

What letters tend to follow these combinations of letters?

Speaker: 00:05:04

And it runs through the system, and then it compares what it thinks it's learned to its data

Speaker: 00:05:10

set, which is the works it's learning from.

Speaker: 00:05:12

And then it makes adjustments, and then it does it again, and then it makes more

Speaker: 00:05:16

adjustments. Um, and by doing this, the by processing this huge amount of information over and

Speaker: 00:05:24

over and over again, many, many, many iterations, the the AI system is then in effect,

Speaker: 00:05:31

able to learn how to mimic human generated content.

Speaker: 00:05:36

So it's able to predict what a human might write in response to a given prompt.

Speaker: 00:05:42

And these things are all driven by prompts that a user gives it,

Speaker: 00:05:46

asking it to to write a specific, uh, piece of text.

Speaker: 00:05:51

And it can then mimic what, what it expects that a human would respond when given that that same prompt.

Speaker: 00:06:00

Amy Qi: Great. I think that makes sense.

Speaker: 00:06:02

Pivoting to the intellectual property landscape.

Speaker: 00:06:06

What does the existing copyright law in Canada look like in the context of AI and generative AI?

Speaker: 00:06:13

Jay Kerr-Wilson: So why the the discussion about generative AI has become so entangled with copyright,

Speaker: 00:06:21

not only in Canada but around the world, is because to train large language models,

Speaker: 00:06:27

you have to provide the system with access to hundreds of millions of copies.

Speaker: 00:06:33

And anytime you're making a copy of something, you're engaging copyright law of the jurisdiction you're in.

Speaker: 00:06:42

You know, ChatGPT has gone out and basically scoured the internet.

Speaker: 00:06:47

So websites, bulletin boards, looking for examples of articles and posts and social

Speaker: 00:06:57

media content and anything that's involved with written language.

Speaker: 00:07:02

And it has scraped all that content and ingested it into its training set that it's then using to do the training process

Speaker: 00:07:10

that I described, and all of that material, or virtually all of that material will be protected by

Speaker: 00:07:19

Speaker: 00:07:25

And in most cases, these AI systems, large language models,

Speaker: 00:07:30

the people who are developing them have not asked for permission to use the content.

Speaker: 00:07:38

So they've simply scraped all this content and then used it to train the AI model.

Speaker: 00:07:43

So now we're at a stage where the various groups of owners of this content,

Speaker: 00:07:51

and whether it's authors of books, or people who take photographs,

Speaker: 00:07:57

or people who create artistic images are all starting to challenge the fact that their work

Speaker: 00:08:07

has been used without their permission.

Speaker: 00:08:09

In the training of these, Canada's Copyright Act has not been amended specifically to deal with generative AI yet.

Speaker: 00:08:19

But copyright law, like all laws, apply to AI.

Speaker: 00:08:25

So. So where there's copying taking place, then the Copyright Act would apply to those copies.

Speaker: 00:08:34

And in Canada, in most other jurisdictions, it's an infringement of copyright to make a copy of a work

Speaker: 00:08:43

without the owner's consent, unless there's an applicable exception.

Speaker: 00:08:46

And we'll talk about the exceptions a little bit later.

Speaker: 00:08:50

But the starting presumption is, is that unless the people who are developing these AI

Speaker: 00:08:57

systems can establish that there's an exception to copyright involved in the training of AI,

Speaker: 00:09:03

they are potentially liable for infringing copyright in a large amount of written material.

Speaker: 00:09:10

And why this is such a challenge to policy makers and to the industry is you can imagine that

Speaker: 00:09:20

if an AI system is trained using hundreds of millions of documents that will be owned by millions of people,

Speaker: 00:09:31

it's hard to conceive of a licensing system where the people who own the copyright in this content would get

Speaker: 00:09:40

paid anything more than fragments of a penny for each work, without creating such a huge cost on

Speaker: 00:09:50

AI developers that AI development will become impossible.

Speaker: 00:09:54

So when you're talking about the scale of hundreds of millions of documents, it's really hard to figure out what

Speaker: 00:09:58

is a price that we can put on this activity that will compensate authors more than a few pennies,

Speaker: 00:10:06

but still make the training of AI models financially feasible.

Speaker: 00:10:13

And so the the that's the tension and the debate that's going on right now.

Speaker: 00:10:19

Amy Qi: I see. Has there been any government response to address some of these issues raised by AI that you're talking about?

Speaker: 00:10:28

Jay Kerr-Wilson: Yeah. So it's interesting. In Canada, there was an initial consultation on whether or not the

Speaker: 00:10:35

Speaker: 00:10:41

And so stakeholders filed their submissions and the government reviewed them.

Speaker: 00:10:48

And this was 5 or 6 years ago, and the government came out and said,

Speaker: 00:10:52

well, we don't think it's necessary to amend the Copyright Act right now.

Speaker: 00:10:57

It's not an urgent situation. We're going to keep monitoring it.

Speaker: 00:11:02

Then shortly after that, we had the release of ChatGPT and a whole bunch of other AI

Speaker: 00:11:11

systems, both text based systems, image based systems, um, and they change the game very,

Speaker: 00:11:18

very quickly. So we started to see, um, a lot of development in the use of AI and a lot of

Speaker: 00:11:26

attention being paid on it, um, by copyright owners.

Speaker: 00:11:32

So the government launched a second consultation specifically to deal with,

Speaker: 00:11:39

uh, generative AI and the new developments that had come out.

Speaker: 00:11:44

And, you know, predictably, people who were from the technology sector.

Speaker: 00:11:52

People who are involved in the development or use of AI systems were suggesting that the Copyright Act should be

Speaker: 00:11:59

amended to include an exception to copyright.

Speaker: 00:12:04

What's often referred to as a text and data mining exception.

Speaker: 00:12:08

So an exception means that if you if you make a particular use of copyright material,

Speaker: 00:12:14

that the use is not an infringement of copyright.

Speaker: 00:12:17

And a lot of times there's conditions on that.

Speaker: 00:12:20

So the idea would being is if you're using material that's available on the internet that you've,

Speaker: 00:12:27

uh, scraped from the internet to train a large language model,

Speaker: 00:12:31

you would not need the consent of the people who own the copyright in that material in order to train your large

Speaker: 00:12:38

language model. And understandably, uh, the creative industries and people who own copyright in

Speaker: 00:12:45

material that's available over the internet Took a very different view,

Speaker: 00:12:50

and they were advocating that there should not be any exception that applies to development of AI systems.

Speaker: 00:13:00

They said that these companies had raised billions of dollars in investment to develop their AI.

Speaker: 00:13:07

They were starting to generate huge amounts of revenue from licensing their AI systems.

Speaker: 00:13:14

And it wasn't fair that these systems were being built on the backs of creators who were receiving no compensation for

Speaker: 00:13:21

the use of their works. That's where we are right now in Canada.

Speaker: 00:13:25

We've had this consultation. The government has received all of this,

Speaker: 00:13:29

all of these submissions from these various different perspectives.

Speaker: 00:13:33

But, you know, we had a federal election and we have a new government.

Speaker: 00:13:38

Obviously, the new government is facing a lot of other priorities on the trade front and on sort of foreign

Speaker: 00:13:45

relations with our largest neighbour.

Speaker: 00:13:48

Um, but at some point, I think they're going to have to come back and take another look at the at the development of AI

Speaker: 00:13:55

and try and come up with a policy that will then be reflected in amendments to the Copyright Act.

Speaker: 00:14:02

Amy Qi: I see. Um, you mentioned earlier that there's some exceptions for copyright infringement.

Speaker: 00:14:10

In Canada, there's a fair dealing exception to copyright for research.

Speaker: 00:14:14

What does this exception entail?

Speaker: 00:14:17

Jay Kerr-Wilson: So section 29 of the Copyright Act is what's known as the fair dealing exception.

Speaker: 00:14:24

And people may have also heard uh, the term fair use.

Speaker: 00:14:28

So fair use describes the sort of the, the system in Canada or the exception in Canada or in the

Speaker: 00:14:34

United States, rather. So in the United States, it's referred to as fair use.

Speaker: 00:14:38

In Canada we refer to it as fair dealing.

Speaker: 00:14:41

And so fair dealing applies to a very specific list of uses,

Speaker: 00:14:48

um, like research, like education, uh, like parody and satire,

Speaker: 00:14:55

um, news reporting. And if you're using copyright material for one of these very specific purposes and your use is

Speaker: 00:15:03

fair, then you don't need the permission of the copyright owner and you don't need to pay compensation.

Speaker: 00:15:11

So one of the very specific uses of fair dealing is fair dealing for the purpose of research.

Speaker: 00:15:19

Uh, so the argument would be that training large language models and other types of generative AI systems is research.

Speaker: 00:15:28

So you could use then copyright material, um, for the training of AI as long as the

Speaker: 00:15:37

use was fair. And whether or not the use is fair is a very fact specific examination that courts will go

Speaker: 00:15:47

through in infringement proceedings where fair dealing has been raised as a defence.

Speaker: 00:15:54

And so they will look at, you know, what is the purpose of the dealing if it's for research or

Speaker: 00:16:00

is it for some other purpose. What is the amount of the dealings or are you dealing with the whole work,

Speaker: 00:16:07

or are you dealing with just part of the work?

Speaker: 00:16:09

What is the nature of the work?

Speaker: 00:16:11

Is this a work that's already widely available?

Speaker: 00:16:14

Is this a work that is confidential?

Speaker: 00:16:17

And one of the key ones in the debate around AI is what is the effect of the dealing on the work?

Speaker: 00:16:24

So if you're involved in fair dealing for the purpose of research,

Speaker: 00:16:28

does that research or that dealing diminish the economic value of the work to the owner?

Speaker: 00:16:37

We haven't had. So there's been a number of cases that have been started in Canada,

Speaker: 00:16:43

but we haven't had any decisions yet.

Speaker: 00:16:46

But that's that's how fair dealing for the purpose of, of research might apply.

Speaker: 00:16:52

Another exception that I think courts will want to look at in the context of generative AI is Canada has a specific

Speaker: 00:17:00

exception that covers temporary reproductions for a technological purpose.

Speaker: 00:17:06

So if you have to make a very brief copy of a work or a reproduction of a work for a temporary purpose,

Speaker: 00:17:13

and it's it's in order to facilitate some technological purpose.

Speaker: 00:17:18

And then once that technological purpose has been fulfilled,

Speaker: 00:17:23

the copy is destroyed. Then that's also another exception to copyright that could apply to to the

Speaker: 00:17:33

use of copyright materials to train AI systems.

Speaker: 00:17:37

Amy Qi: You mentioned that in the US, this concept of fair dealing is called fair use in section

Speaker: 00:17:43

107 of the Copyright Act. It outlines the four factors for determining fair use.

Speaker: 00:17:49

Can you briefly explain this criteria as well?

Speaker: 00:17:53

Jay Kerr-Wilson: Sure. And there is a lot of overlap between the fair use analysis in the US and the fair dealing analysis in Canada.

Speaker: 00:18:02

But there are some differences. And one of the big differences in the US,

Speaker: 00:18:06

if you want to make fair use of somebody else's work, then your use has to be what's known as transformative.

Speaker: 00:18:16

So you have to use the work to, in effect, create something new.

Speaker: 00:18:20

And that's a requirement of the use being fair in the US And in Canada,

Speaker: 00:18:29

we don't have that same factor in our fair dealing analysis.

Speaker: 00:18:35

Whether or not the dealing is transformative isn't something that courts will look at,

Speaker: 00:18:42

but in many other ways the two analysis are very similar.

Speaker: 00:18:47

So courts in the US will look at whether or not the use has a prejudicial impact on the market value for

Speaker: 00:18:57

the copyrighted work. Um, what is the nature of the work?

Speaker: 00:19:01

Is the posed fair use for a commercial purpose, a non-commercial purpose?

Speaker: 00:19:07

So it's similar. But the big difference is this requirement in the US that the use be transformative.

Speaker: 00:19:13

Amy Qi: Okay, great. Now that we understand some of the tech and legal background associated with AI training data,

Speaker: 00:19:20

I think it's interesting to see how the theory is applied in court.

Speaker: 00:19:24

Recently, there have been several landmark cases in court decisions pertaining to the use of copyrighted material to

Speaker: 00:19:31

train AI models that we've been discussing.

Speaker: 00:19:34

Two notable cases are the lawsuits that both Anthropic and Meta are facing in the San Francisco US District Court,

Speaker: 00:19:43

starting with Anthropic. Could you give us a summary of the context for this case?

Speaker: 00:19:47

Jay Kerr-Wilson: Sure. So Anthropic is an AI company that's developed, uh, a family of large language models that it's named

Speaker: 00:19:54

Claude. Uh, so it works similar to ChatGPT.

Speaker: 00:19:58

And in order to train their large language model, anthropic really focused on using books in their training

Speaker: 00:20:06

set. They had determined that, you know, books by their very nature,

Speaker: 00:20:13

um, provide much better training content than blog posts or articles or much smaller pieces of

Speaker: 00:20:23

text. Books tend to be much better written.

Speaker: 00:20:26

They tend to be more complex. Uh, they express sort of more comprehensive thoughts.

Speaker: 00:20:32

So if you want to train a system about how to predict and produce high value Literary

Speaker: 00:20:41

content. The the idea was that books provides the ideal training set.

Speaker: 00:20:47

So anthropic built its training data set from two sources.

Speaker: 00:20:52

So it licensed a large amount of copyrighted books.

Speaker: 00:20:57

So it actually purchased books.

Speaker: 00:20:59

And then it scanned these books into digital form and fed them into its training set.

Speaker: 00:21:05

But these were books that it had bought and paid for, but it had also was alleged that it had downloaded

Speaker: 00:21:15

millions of copies of copyrighted books from what are known as pirate sites on the internet,

Speaker: 00:21:22

so large files that contained millions and millions of digital copies of books for which the publishers and authors

Speaker: 00:21:28

had never given their consent for the copies to be made.

Speaker: 00:21:33

Anthropic used both of these sort of sources for books, for literary materials to then train its Claude large

Speaker: 00:21:43

language model. And there was what was known as a class action.

Speaker: 00:21:48

So a class action lawsuit is where you have one plaintiff who represents a whole class of similarly situated

Speaker: 00:21:58

plaintiffs. So in this case, we had an author was suing Anthropic in California for the unauthorised use of his

Speaker: 00:22:06

book, but his litigation would also cover, uh, all authors in the class whose books had then been used

Speaker: 00:22:15

by Anthropic uh, without consent.

Speaker: 00:22:19

Amy Qi: And what was the court's decision in this case?

Speaker: 00:22:23

Jay Kerr-Wilson: So there was a couple different layers of the court's decision.

Speaker: 00:22:27

And again, it depends. It turned on the fact that there was the two sources of data.

Speaker: 00:22:33

So there was the the books that anthropic had bought.

Speaker: 00:22:36

And then there was the. Scanned digital copies that it had acquired without the author's consent.

Speaker: 00:22:44

And the fact that Anthropic. In addition to training its large language model,

Speaker: 00:22:49

Claude. It was also building a central library of as many literary works as it could.

Speaker: 00:22:57

So it wanted to both house a central library that would then persist of books,

Speaker: 00:23:03

but also train its large language model.

Speaker: 00:23:08

So the court on the on the training data, it did the fair use analysis and it said absolutely that

Speaker: 00:23:18

training large language models and other AI systems was transformative.

Speaker: 00:23:23

Uh, it found that, um, the that there was no impact on the commercial value of the works that

Speaker: 00:23:33

were used in the training model because Anthropic had put safeguards in place.

Speaker: 00:23:37

So if you're using Claude, the large language model, you couldn't ask Claude to simply reproduce one of the books

Speaker: 00:23:45

that was in the training data.

Speaker: 00:23:47

Claude would not. Was designed not to reproduce a verbatim a book that it had that had been included in its system.

Speaker: 00:23:56

It would only generate brand new content.

Speaker: 00:23:59

So the court concluded, well, you know, Claude's not going to produce copies of Catcher in

Speaker: 00:24:07

the Rye that will then compete with the original Catcher in the Rye that's included in the in the training set.

Speaker: 00:24:16

So, the court concluded that that based on its fair use analysis,

Speaker: 00:24:21

training LLMs in the way that that Anthropic was doing, it was fair use and therefore not infringement

Speaker: 00:24:31

of copyright. On the issue of maintaining this large, persistent digital library of books.

Speaker: 00:24:40

The court said anthropic had a right to keep digital copies of the books it had legitimately purchased and scanned

Speaker: 00:24:48

because it owned those books, but it didn't have a right to keep in this large collection

Speaker: 00:24:56

the pirated copies of the books.

Speaker: 00:24:58

So the court ultimately found that there was liability for the reproduction of the copyright material in the persistent

Speaker: 00:25:06

library, but that fair use covered the training model, which was sort of the big takeaway out of that case.

Speaker: 00:25:14

Amy Qi: I know you've already briefly touched on this, but just to get into more specifics.

Speaker: 00:25:20

How were the four factors that we've talked about before of American copyright law for fair use evaluated in this case?

Speaker: 00:25:28

Jay Kerr-Wilson: Sure. And just to sort of go through the list, so the the, the purpose and character of the use in this

Speaker: 00:25:34

case was the training using the the copies of the books to train large language model,

Speaker: 00:25:40

which the court had no trouble finding that the the purpose was transformative because,

Speaker: 00:25:47

um, of the process of breaking the works down into the small fragments and the fact that you're producing something new

Speaker: 00:25:55

at the end, the fact that the author's works were creative worked against,

Speaker: 00:26:03

uh, anthropic on the nature of the of the works.

Speaker: 00:26:07

Uh, but the amount of the use the court actually found in anthropic favour,

Speaker: 00:26:14

because one of the things about training large language models is you want large language models to produce content

Speaker: 00:26:23

that is as free of bias as possible and as, you know, accurate as possible.

Speaker: 00:26:31

And you want to avoid AI systems that are, you know, hallucinate.

Speaker: 00:26:36

And in order to to generate the best possible outcomes, you have to have as much material from as many different

Speaker: 00:26:42

sources as you can. So in this case, the court said, in fact,

Speaker: 00:26:46

the amount of material that anthropic was using worked in favour of fair use because it made for better outcomes.

Speaker: 00:26:55

And then on the effect of the market, the court found in favour of Anthropic because they said,

Speaker: 00:27:01

you know, there was no the plaintiffs could not establish that Anthropic was producing copies that were

Speaker: 00:27:11

competing directly with the works that were in the training set.

Speaker: 00:27:16

Amy Qi: Very recently, Anthropic has actually announced that it has reached a preliminary settlement in this class action

Speaker: 00:27:23

lawsuit. In a court filing on August 26th, 2025.

Speaker: 00:27:27

What does this mean and what are the impacts of the settlement?

Speaker: 00:27:32

Jay Kerr-Wilson: So, as I said, although Anthropic was successful on the fair use question,

Speaker: 00:27:38

it um, uh for the training, it was not successful on the sort of the large collection,

Speaker: 00:27:45

the library it had built. And because of the number ofindividual books that were in this library for which there

Speaker: 00:27:54

was potential liability, Anthropic was facing a huge potential financial hit from damages.

Speaker: 00:28:03

So, you know, this was just a way to not have to go through the process of assessing the damages in an unnecessary

Speaker: 00:28:12

trial. So the settlement then sort of puts the question of damages at an end.

Speaker: 00:28:19

Amy Qi: Right. And the other case that we mentioned earlier involves Meta,

Speaker: 00:28:25

and I think it serves as an interesting comparative to the Anthropic one,

Speaker: 00:28:28

because they both occurred in the same court and it has similar facts to the anthropic case.

Speaker: 00:28:35

So could you give a summary of the context for this Meta case?

Speaker: 00:28:39

Jay Kerr-Wilson: Sure. So Meta, as most people know, is the company that operates social media platforms such as

Speaker: 00:28:46

Facebook and Instagram. But it's also developing its own large language models called llama.

Speaker: 00:28:53

Similar to the Anthropic case, Meta had used a very large volume of books for which it had

Speaker: 00:29:01

not obtained the consent of the publishers of the authors in order to train its models.

Speaker: 00:29:08

And so, again, it was a class action.

Speaker: 00:29:11

So there was a representative plaintiff author who represented a class of authors,

Speaker: 00:29:15

and they were claiming copyright infringement because of the copies that were used to train the large language model.

Speaker: 00:29:23

Amy Qi: And what was the court's decision in this case?

Speaker: 00:29:27

Jay Kerr-Wilson: This is an interesting outcome.

Speaker: 00:29:28

So the court ultimately so so what had happened as Meta had brought a motion for summary judgement,

Speaker: 00:29:36

and the court granted Meta's motion.

Speaker: 00:29:41

So so Meta ultimately won the case on the summary judgement motion.

Speaker: 00:29:46

But what was really interesting was the court did the same kind of fair use analysis that the court in Anthropic had

Speaker: 00:29:56

done, and came to almost an opposite result, and largely on the question of what is the impact of large

Speaker: 00:30:06

language models on the market for published books, and why this is particularly interesting is this is the

Speaker: 00:30:12

exact same court, same district court in the US and California,

Speaker: 00:30:19

different judges. And these decisions were issued just a few days apart.

Speaker: 00:30:23

And the judge in the Meta decision, in fact, explicitly references and disagrees with the

Speaker: 00:30:30

decision of the same court in the anthropic case.

Speaker: 00:30:34

So on this question of what is the effect of large language models on the copyright works or the books that are used in

Speaker: 00:30:42

training? The court in the Anthropic case said, um, well, anthropic large language model.

Speaker: 00:30:50

Claude isn't producing copies of the books, so it's not competing with the books in the training set

Speaker: 00:30:58

because it's not reproducing those books, it's producing new books.

Speaker: 00:31:01

So there's no effect on the value of catcher in the Rye.

Speaker: 00:31:06

When the large language model produces brand new books or new written texts that aren't catcher in the Rye.

Speaker: 00:31:14

So it took a very narrow view of the analysis of what is the impact on the on the market for books.

Speaker: 00:31:22

The court in the Meta decision took a very different view.

Speaker: 00:31:26

It took a very broad view of what is the impact of large language models on the publishing industry in general,

Speaker: 00:31:35

and the court was very concerned that because large language models are able to crank out thousands or

Speaker: 00:31:45

millions of books very quickly, very cheaply, um, that are,

Speaker: 00:31:51

you know, based on what it has learned from the published works,

Speaker: 00:31:56

it would overwhelm the publishing market.

Speaker: 00:31:58

So human authors will have a hard time being able to make a living,

Speaker: 00:32:02

because they're then going to be competing against all of these mass produced AI copies that are going to overwhelm

Speaker: 00:32:09

the publishing industry. So it's a very different from a very narrow perspective,

Speaker: 00:32:15

you know, will the individual author be harmed versus a very broad policy based perspective in the meta court that says,

Speaker: 00:32:25

you know, the publishing industry, in fact, is in peril or human.

Speaker: 00:32:29

The human element of the publishing industry is in peril if AI is allowed to run unchecked.

Speaker: 00:32:36

We haven't seen the end of this debate in the US, and there is very similar class action lawsuits that have

Speaker: 00:32:43

been commenced in Canada. We don't have any decisions in Canada yet,

Speaker: 00:32:47

but this the the court battle over the use of AI is just beginning.

Speaker: 00:32:52

It's far from over.

Speaker: 00:32:54

Amy Qi: You mentioned Canada there at the end.

Speaker: 00:32:57

Um, how do you think these cases could be evaluated if they were transposed to a Canadian copyright law context?

Speaker: 00:33:05

Jay Kerr-Wilson: So it's interesting because Canada tends to have, under its fair dealing,

Speaker: 00:33:12

uh, analysis results that are more user friendly than the United States.

Speaker: 00:33:18

So we've had decisions of the Supreme Court of Canada that said,

Speaker: 00:33:23

fair dealing isn't just a technical exception to copyright.

Speaker: 00:33:27

It's actually a user's right. So this is how fair dealing has been constructed,

Speaker: 00:33:32

as Canada and the court has told us that copyright has to be analysed as a balance between the interests of the user,

Speaker: 00:33:41

the public interest and the interest of the owner.

Speaker: 00:33:45

So although we haven't seen any cases yet, and I think there's lots of good arguments on both sides,

Speaker: 00:33:52

my prediction would be that an analysis of a Canadian court would likely be closer to the anthropic,

Speaker: 00:34:01

uh, analysis, that it would take a very narrow view of what is the impact of the use or the dealing on the particular

Speaker: 00:34:10

work at issue, and not take the broad policy based approach that the court did in the meta case?

Speaker: 00:34:17

Amy Qi: That makes sense. Um, just to end off the episode with a more general question,

Speaker: 00:34:23

where do we go from here?

Speaker: 00:34:26

Jay Kerr-Wilson: So these issues need to be resolved.

Speaker: 00:34:28

I think everyone understands that, you know, the generative AI genie's out of the bottle.

Speaker: 00:34:33

It's not going back in. Um, all industries are starting to adopt AI solutions and AI technologies,

Speaker: 00:34:41

so there needs to be some sort of resolution to these issues,

Speaker: 00:34:44

and it's going to have to be a policy based resolution.

Speaker: 00:34:47

So ultimately, I think there's going to have to be legislative amendments.

Speaker: 00:34:51

We may get some court decisions before that happens.

Speaker: 00:34:54

And oftentimes court decisions will inform what governments will do with legislative amendments.

Speaker: 00:35:00

The challenge is it takes a long time for these cases to go through the court system,

Speaker: 00:35:06

and it takes a long time for governments to pass legislation to amend the Copyright Act to deal with these emerging

Speaker: 00:35:15

technologies. And the technology's going really fast.

Speaker: 00:35:18

So even if we figured out a way to, you know, resolve the issues that are confronting us today.

Speaker: 00:35:26

You know, three years from now, we're probably going to have a whole different set of

Speaker: 00:35:30

challenges that will also have to be addressed.

Speaker: 00:35:34

Amy Qi: Okay. I think that's a good note to end on.

Speaker: 00:35:37

Thank you to Jay for your insight on generative AI and the legal issues associated with it.

Speaker: 00:35:43

As we've seen, there's a lot of nuance involved, but I think you really helped clarify the legal debate on

Speaker: 00:35:49

the potential copyright infringement involved in AI training data.

Speaker: 00:35:54

Thank you for listening to this episode of Perspectives and Copyright,

Speaker: 00:35:59

and make sure to tune in for the next one.

Share Episode

Transcripts

Follow

Chapters

Video

More from YouTube