Amy Qi: Hello, I'm Amy from McGill University and with me is Jay Kerr-Wilson,
Speaker:a partner in Fasken's Ottawa office and head of the firm's copyright practice.
Speaker:Welcome to the first episode in our series, Perspectives, AI and Copyright:
Speaker:Exploring the legal issues posed by adoption of generative artificial intelligence systems.
Speaker:In today's episode, we're exploring the legal debate surrounding the use of copyright protected content to train
Speaker:AI models and how courts and governments are responding to the conflicts between AI developers and content creators and
Speaker:owners. For context. So we're all on the same page.
Speaker:What exactly is machine learning and generative AI?
Speaker:Jay Kerr-Wilson: Thanks, Amy. So artificial intelligence refers to an umbrella term for technology that's designed to perform a
Speaker:human task. So, you've got a computer system or machinery that is doing something that until then,
Speaker:humans had done. And, you know, we've had AI in their simplest form for a long time.
Speaker:For example, stoplights and calculators are both examples of very simple AI that have been around for a long time.
Speaker:Machine learning is a more complex version of artificial intelligence,
Speaker:and it focuses on teaching algorithms to learn from data without being explicitly programmed.
Speaker:So you set up a system that can analyze data and apply data to rules and to
Speaker:learn patterns, and then to predict behaviour or future events based on those patterns that it has learned.
Speaker:And really simple examples of machine learning that everyone is familiar with.
Speaker:If you have a Spotify account or a Netflix account that recommends,
Speaker:um, new shows for you to watch or new music for you to listen to based on your prior consumption.
Speaker:So the AI has kept track of everything you've consumed on the platform,
Speaker:and it says, oh, "Amy's very interested in historical drama".
Speaker:And so then it will go through using that and predict other titles of,
Speaker:um, of that genre that you might also enjoy.
Speaker:The game changer that we've seen in the last few years is what's referred to as generative AI.
Speaker:And this is the CHATGPT, for example.
Speaker:What's different about generative AI is where machine learning could could apply a rule to a set of
Speaker:data and then predict outcomes.
Speaker:Generative AI is able to take a huge amount of data and learn the rules itself and then use
Speaker:those rules to generate brand new data.
Speaker:So ChatGPT, um, by reviewing huge amounts of examples of written documents,
Speaker:is able to learn how documents are constructed, how humans write documents,
Speaker:and then, and in fact write its own documents by applying what it's learned and creating something brand new.
Speaker:So that's sort of the evolution from artificial intelligence through machine learning.
Speaker:To now generative AI, which is what we're talking about today.
Speaker:Amy Qi: You say that machine learning involves teaching algorithms to learn from data.
Speaker:How specifically does generative AI use this data to train models?
Speaker:Jay Kerr-Wilson: Word of caution I'm not a computer scientist, so this is going to be in very simple terms.
Speaker:But what large language models, which are text based generative AI systems like ChatGPT are
Speaker:able to do is they analyse hundreds of millions of documents,
Speaker:examples of of written material, and they break that material down those documents down into
Speaker:very small fragments of text, um, that are then called tokens.
Speaker:And these tokens are then assigned numerical values.
Speaker:What the system is able to do is it learns the relationships between tokens,
Speaker:but at a huge scale. Hundreds of millions of times.
Speaker:What words tend to follow these combination of words?
Speaker:What letters tend to follow these combinations of letters?
Speaker:And it runs through the system, and then it compares what it thinks it's learned to its data
Speaker:set, which is the works it's learning from.
Speaker:And then it makes adjustments, and then it does it again, and then it makes more
Speaker:adjustments. Um, and by doing this, the by processing this huge amount of information over and
Speaker:over and over again, many, many, many iterations, the the AI system is then in effect,
Speaker:able to learn how to mimic human generated content.
Speaker:So it's able to predict what a human might write in response to a given prompt.
Speaker:And these things are all driven by prompts that a user gives it,
Speaker:asking it to to write a specific, uh, piece of text.
Speaker:And it can then mimic what, what it expects that a human would respond when given that that same prompt.
Speaker:Amy Qi: Great. I think that makes sense.
Speaker:Pivoting to the intellectual property landscape.
Speaker:What does the existing copyright law in Canada look like in the context of AI and generative AI?
Speaker:Jay Kerr-Wilson: So why the the discussion about generative AI has become so entangled with copyright,
Speaker:not only in Canada but around the world, is because to train large language models,
Speaker:you have to provide the system with access to hundreds of millions of copies.
Speaker:And anytime you're making a copy of something, you're engaging copyright law of the jurisdiction you're in.
Speaker:You know, ChatGPT has gone out and basically scoured the internet.
Speaker:So websites, bulletin boards, looking for examples of articles and posts and social
Speaker:media content and anything that's involved with written language.
Speaker:And it has scraped all that content and ingested it into its training set that it's then using to do the training process
Speaker:that I described, and all of that material, or virtually all of that material will be protected by
Speaker:copyright. It's material that's owned by somebody, whoever the author or owner was.
Speaker:And in most cases, these AI systems, large language models,
Speaker:the people who are developing them have not asked for permission to use the content.
Speaker:So they've simply scraped all this content and then used it to train the AI model.
Speaker:So now we're at a stage where the various groups of owners of this content,
Speaker:and whether it's authors of books, or people who take photographs,
Speaker:or people who create artistic images are all starting to challenge the fact that their work
Speaker:has been used without their permission.
Speaker:In the training of these, Canada's Copyright Act has not been amended specifically to deal with generative AI yet.
Speaker:But copyright law, like all laws, apply to AI.
Speaker:So. So where there's copying taking place, then the Copyright Act would apply to those copies.
Speaker:And in Canada, in most other jurisdictions, it's an infringement of copyright to make a copy of a work
Speaker:without the owner's consent, unless there's an applicable exception.
Speaker:And we'll talk about the exceptions a little bit later.
Speaker:But the starting presumption is, is that unless the people who are developing these AI
Speaker:systems can establish that there's an exception to copyright involved in the training of AI,
Speaker:they are potentially liable for infringing copyright in a large amount of written material.
Speaker:And why this is such a challenge to policy makers and to the industry is you can imagine that
Speaker:if an AI system is trained using hundreds of millions of documents that will be owned by millions of people,
Speaker:it's hard to conceive of a licensing system where the people who own the copyright in this content would get
Speaker:paid anything more than fragments of a penny for each work, without creating such a huge cost on
Speaker:AI developers that AI development will become impossible.
Speaker:So when you're talking about the scale of hundreds of millions of documents, it's really hard to figure out what
Speaker:is a price that we can put on this activity that will compensate authors more than a few pennies,
Speaker:but still make the training of AI models financially feasible.
Speaker:And so the the that's the tension and the debate that's going on right now.
Speaker:Amy Qi: I see. Has there been any government response to address some of these issues raised by AI that you're talking about?
Speaker:Jay Kerr-Wilson: Yeah. So it's interesting. In Canada, there was an initial consultation on whether or not the
Speaker:Copyright Act should be amended to respond to the development of artificial intelligence.
Speaker:And so stakeholders filed their submissions and the government reviewed them.
Speaker:And this was 5 or 6 years ago, and the government came out and said,
Speaker:well, we don't think it's necessary to amend the Copyright Act right now.
Speaker:It's not an urgent situation. We're going to keep monitoring it.
Speaker:Then shortly after that, we had the release of ChatGPT and a whole bunch of other AI
Speaker:systems, both text based systems, image based systems, um, and they change the game very,
Speaker:very quickly. So we started to see, um, a lot of development in the use of AI and a lot of
Speaker:attention being paid on it, um, by copyright owners.
Speaker:So the government launched a second consultation specifically to deal with,
Speaker:uh, generative AI and the new developments that had come out.
Speaker:And, you know, predictably, people who were from the technology sector.
Speaker:People who are involved in the development or use of AI systems were suggesting that the Copyright Act should be
Speaker:amended to include an exception to copyright.
Speaker:What's often referred to as a text and data mining exception.
Speaker:So an exception means that if you if you make a particular use of copyright material,
Speaker:that the use is not an infringement of copyright.
Speaker:And a lot of times there's conditions on that.
Speaker:So the idea would being is if you're using material that's available on the internet that you've,
Speaker:uh, scraped from the internet to train a large language model,
Speaker:you would not need the consent of the people who own the copyright in that material in order to train your large
Speaker:language model. And understandably, uh, the creative industries and people who own copyright in
Speaker:material that's available over the internet Took a very different view,
Speaker:and they were advocating that there should not be any exception that applies to development of AI systems.
Speaker:They said that these companies had raised billions of dollars in investment to develop their AI.
Speaker:They were starting to generate huge amounts of revenue from licensing their AI systems.
Speaker:And it wasn't fair that these systems were being built on the backs of creators who were receiving no compensation for
Speaker:the use of their works. That's where we are right now in Canada.
Speaker:We've had this consultation. The government has received all of this,
Speaker:all of these submissions from these various different perspectives.
Speaker:But, you know, we had a federal election and we have a new government.
Speaker:Obviously, the new government is facing a lot of other priorities on the trade front and on sort of foreign
Speaker:relations with our largest neighbour.
Speaker:Um, but at some point, I think they're going to have to come back and take another look at the at the development of AI
Speaker:and try and come up with a policy that will then be reflected in amendments to the Copyright Act.
Speaker:Amy Qi: I see. Um, you mentioned earlier that there's some exceptions for copyright infringement.
Speaker:In Canada, there's a fair dealing exception to copyright for research.
Speaker:What does this exception entail?
Speaker:Jay Kerr-Wilson: So section 29 of the Copyright Act is what's known as the fair dealing exception.
Speaker:And people may have also heard uh, the term fair use.
Speaker:So fair use describes the sort of the, the system in Canada or the exception in Canada or in the
Speaker:United States, rather. So in the United States, it's referred to as fair use.
Speaker:In Canada we refer to it as fair dealing.
Speaker:And so fair dealing applies to a very specific list of uses,
Speaker:um, like research, like education, uh, like parody and satire,
Speaker:um, news reporting. And if you're using copyright material for one of these very specific purposes and your use is
Speaker:fair, then you don't need the permission of the copyright owner and you don't need to pay compensation.
Speaker:So one of the very specific uses of fair dealing is fair dealing for the purpose of research.
Speaker:Uh, so the argument would be that training large language models and other types of generative AI systems is research.
Speaker:So you could use then copyright material, um, for the training of AI as long as the
Speaker:use was fair. And whether or not the use is fair is a very fact specific examination that courts will go
Speaker:through in infringement proceedings where fair dealing has been raised as a defence.
Speaker:And so they will look at, you know, what is the purpose of the dealing if it's for research or
Speaker:is it for some other purpose. What is the amount of the dealings or are you dealing with the whole work,
Speaker:or are you dealing with just part of the work?
Speaker:What is the nature of the work?
Speaker:Is this a work that's already widely available?
Speaker:Is this a work that is confidential?
Speaker:And one of the key ones in the debate around AI is what is the effect of the dealing on the work?
Speaker:So if you're involved in fair dealing for the purpose of research,
Speaker:does that research or that dealing diminish the economic value of the work to the owner?
Speaker:We haven't had. So there's been a number of cases that have been started in Canada,
Speaker:but we haven't had any decisions yet.
Speaker:But that's that's how fair dealing for the purpose of, of research might apply.
Speaker:Another exception that I think courts will want to look at in the context of generative AI is Canada has a specific
Speaker:exception that covers temporary reproductions for a technological purpose.
Speaker:So if you have to make a very brief copy of a work or a reproduction of a work for a temporary purpose,
Speaker:and it's it's in order to facilitate some technological purpose.
Speaker:And then once that technological purpose has been fulfilled,
Speaker:the copy is destroyed. Then that's also another exception to copyright that could apply to to the
Speaker:use of copyright materials to train AI systems.
Speaker:Amy Qi: You mentioned that in the US, this concept of fair dealing is called fair use in section
Speaker:107 of the Copyright Act. It outlines the four factors for determining fair use.
Speaker:Can you briefly explain this criteria as well?
Speaker:Jay Kerr-Wilson: Sure. And there is a lot of overlap between the fair use analysis in the US and the fair dealing analysis in Canada.
Speaker:But there are some differences. And one of the big differences in the US,
Speaker:if you want to make fair use of somebody else's work, then your use has to be what's known as transformative.
Speaker:So you have to use the work to, in effect, create something new.
Speaker:And that's a requirement of the use being fair in the US And in Canada,
Speaker:we don't have that same factor in our fair dealing analysis.
Speaker:Whether or not the dealing is transformative isn't something that courts will look at,
Speaker:but in many other ways the two analysis are very similar.
Speaker:So courts in the US will look at whether or not the use has a prejudicial impact on the market value for
Speaker:the copyrighted work. Um, what is the nature of the work?
Speaker:Is the posed fair use for a commercial purpose, a non-commercial purpose?
Speaker:So it's similar. But the big difference is this requirement in the US that the use be transformative.
Speaker:Amy Qi: Okay, great. Now that we understand some of the tech and legal background associated with AI training data,
Speaker:I think it's interesting to see how the theory is applied in court.
Speaker:Recently, there have been several landmark cases in court decisions pertaining to the use of copyrighted material to
Speaker:train AI models that we've been discussing.
Speaker:Two notable cases are the lawsuits that both Anthropic and Meta are facing in the San Francisco US District Court,
Speaker:starting with Anthropic. Could you give us a summary of the context for this case?
Speaker:Jay Kerr-Wilson: Sure. So Anthropic is an AI company that's developed, uh, a family of large language models that it's named
Speaker:Claude. Uh, so it works similar to ChatGPT.
Speaker:And in order to train their large language model, anthropic really focused on using books in their training
Speaker:set. They had determined that, you know, books by their very nature,
Speaker:um, provide much better training content than blog posts or articles or much smaller pieces of
Speaker:text. Books tend to be much better written.
Speaker:They tend to be more complex. Uh, they express sort of more comprehensive thoughts.
Speaker:So if you want to train a system about how to predict and produce high value Literary
Speaker:content. The the idea was that books provides the ideal training set.
Speaker:So anthropic built its training data set from two sources.
Speaker:So it licensed a large amount of copyrighted books.
Speaker:So it actually purchased books.
Speaker:And then it scanned these books into digital form and fed them into its training set.
Speaker:But these were books that it had bought and paid for, but it had also was alleged that it had downloaded
Speaker:millions of copies of copyrighted books from what are known as pirate sites on the internet,
Speaker:so large files that contained millions and millions of digital copies of books for which the publishers and authors
Speaker:had never given their consent for the copies to be made.
Speaker:Anthropic used both of these sort of sources for books, for literary materials to then train its Claude large
Speaker:language model. And there was what was known as a class action.
Speaker:So a class action lawsuit is where you have one plaintiff who represents a whole class of similarly situated
Speaker:plaintiffs. So in this case, we had an author was suing Anthropic in California for the unauthorised use of his
Speaker:book, but his litigation would also cover, uh, all authors in the class whose books had then been used
Speaker:by Anthropic uh, without consent.
Speaker:Amy Qi: And what was the court's decision in this case?
Speaker:Jay Kerr-Wilson: So there was a couple different layers of the court's decision.
Speaker:And again, it depends. It turned on the fact that there was the two sources of data.
Speaker:So there was the the books that anthropic had bought.
Speaker:And then there was the. Scanned digital copies that it had acquired without the author's consent.
Speaker:And the fact that Anthropic. In addition to training its large language model,
Speaker:Claude. It was also building a central library of as many literary works as it could.
Speaker:So it wanted to both house a central library that would then persist of books,
Speaker:but also train its large language model.
Speaker:So the court on the on the training data, it did the fair use analysis and it said absolutely that
Speaker:training large language models and other AI systems was transformative.
Speaker:Uh, it found that, um, the that there was no impact on the commercial value of the works that
Speaker:were used in the training model because Anthropic had put safeguards in place.
Speaker:So if you're using Claude, the large language model, you couldn't ask Claude to simply reproduce one of the books
Speaker:that was in the training data.
Speaker:Claude would not. Was designed not to reproduce a verbatim a book that it had that had been included in its system.
Speaker:It would only generate brand new content.
Speaker:So the court concluded, well, you know, Claude's not going to produce copies of Catcher in
Speaker:the Rye that will then compete with the original Catcher in the Rye that's included in the in the training set.
Speaker:So, the court concluded that that based on its fair use analysis,
Speaker:training LLMs in the way that that Anthropic was doing, it was fair use and therefore not infringement
Speaker:of copyright. On the issue of maintaining this large, persistent digital library of books.
Speaker:The court said anthropic had a right to keep digital copies of the books it had legitimately purchased and scanned
Speaker:because it owned those books, but it didn't have a right to keep in this large collection
Speaker:the pirated copies of the books.
Speaker:So the court ultimately found that there was liability for the reproduction of the copyright material in the persistent
Speaker:library, but that fair use covered the training model, which was sort of the big takeaway out of that case.
Speaker:Amy Qi: I know you've already briefly touched on this, but just to get into more specifics.
Speaker:How were the four factors that we've talked about before of American copyright law for fair use evaluated in this case?
Speaker:Jay Kerr-Wilson: Sure. And just to sort of go through the list, so the the, the purpose and character of the use in this
Speaker:case was the training using the the copies of the books to train large language model,
Speaker:which the court had no trouble finding that the the purpose was transformative because,
Speaker:um, of the process of breaking the works down into the small fragments and the fact that you're producing something new
Speaker:at the end, the fact that the author's works were creative worked against,
Speaker:uh, anthropic on the nature of the of the works.
Speaker:Uh, but the amount of the use the court actually found in anthropic favour,
Speaker:because one of the things about training large language models is you want large language models to produce content
Speaker:that is as free of bias as possible and as, you know, accurate as possible.
Speaker:And you want to avoid AI systems that are, you know, hallucinate.
Speaker:And in order to to generate the best possible outcomes, you have to have as much material from as many different
Speaker:sources as you can. So in this case, the court said, in fact,
Speaker:the amount of material that anthropic was using worked in favour of fair use because it made for better outcomes.
Speaker:And then on the effect of the market, the court found in favour of Anthropic because they said,
Speaker:you know, there was no the plaintiffs could not establish that Anthropic was producing copies that were
Speaker:competing directly with the works that were in the training set.
Speaker:Amy Qi: Very recently, Anthropic has actually announced that it has reached a preliminary settlement in this class action
Speaker:lawsuit. In a court filing on August 26th, 2025.
Speaker:What does this mean and what are the impacts of the settlement?
Speaker:Jay Kerr-Wilson: So, as I said, although Anthropic was successful on the fair use question,
Speaker:it um, uh for the training, it was not successful on the sort of the large collection,
Speaker:the library it had built. And because of the number ofindividual books that were in this library for which there
Speaker:was potential liability, Anthropic was facing a huge potential financial hit from damages.
Speaker:So, you know, this was just a way to not have to go through the process of assessing the damages in an unnecessary
Speaker:trial. So the settlement then sort of puts the question of damages at an end.
Speaker:Amy Qi: Right. And the other case that we mentioned earlier involves Meta,
Speaker:and I think it serves as an interesting comparative to the Anthropic one,
Speaker:because they both occurred in the same court and it has similar facts to the anthropic case.
Speaker:So could you give a summary of the context for this Meta case?
Speaker:Jay Kerr-Wilson: Sure. So Meta, as most people know, is the company that operates social media platforms such as
Speaker:Facebook and Instagram. But it's also developing its own large language models called llama.
Speaker:Similar to the Anthropic case, Meta had used a very large volume of books for which it had
Speaker:not obtained the consent of the publishers of the authors in order to train its models.
Speaker:And so, again, it was a class action.
Speaker:So there was a representative plaintiff author who represented a class of authors,
Speaker:and they were claiming copyright infringement because of the copies that were used to train the large language model.
Speaker:Amy Qi: And what was the court's decision in this case?
Speaker:Jay Kerr-Wilson: This is an interesting outcome.
Speaker:So the court ultimately so so what had happened as Meta had brought a motion for summary judgement,
Speaker:and the court granted Meta's motion.
Speaker:So so Meta ultimately won the case on the summary judgement motion.
Speaker:But what was really interesting was the court did the same kind of fair use analysis that the court in Anthropic had
Speaker:done, and came to almost an opposite result, and largely on the question of what is the impact of large
Speaker:language models on the market for published books, and why this is particularly interesting is this is the
Speaker:exact same court, same district court in the US and California,
Speaker:different judges. And these decisions were issued just a few days apart.
Speaker:And the judge in the Meta decision, in fact, explicitly references and disagrees with the
Speaker:decision of the same court in the anthropic case.
Speaker:So on this question of what is the effect of large language models on the copyright works or the books that are used in
Speaker:training? The court in the Anthropic case said, um, well, anthropic large language model.
Speaker:Claude isn't producing copies of the books, so it's not competing with the books in the training set
Speaker:because it's not reproducing those books, it's producing new books.
Speaker:So there's no effect on the value of catcher in the Rye.
Speaker:When the large language model produces brand new books or new written texts that aren't catcher in the Rye.
Speaker:So it took a very narrow view of the analysis of what is the impact on the on the market for books.
Speaker:The court in the Meta decision took a very different view.
Speaker:It took a very broad view of what is the impact of large language models on the publishing industry in general,
Speaker:and the court was very concerned that because large language models are able to crank out thousands or
Speaker:millions of books very quickly, very cheaply, um, that are,
Speaker:you know, based on what it has learned from the published works,
Speaker:it would overwhelm the publishing market.
Speaker:So human authors will have a hard time being able to make a living,
Speaker:because they're then going to be competing against all of these mass produced AI copies that are going to overwhelm
Speaker:the publishing industry. So it's a very different from a very narrow perspective,
Speaker:you know, will the individual author be harmed versus a very broad policy based perspective in the meta court that says,
Speaker:you know, the publishing industry, in fact, is in peril or human.
Speaker:The human element of the publishing industry is in peril if AI is allowed to run unchecked.
Speaker:We haven't seen the end of this debate in the US, and there is very similar class action lawsuits that have
Speaker:been commenced in Canada. We don't have any decisions in Canada yet,
Speaker:but this the the court battle over the use of AI is just beginning.
Speaker:It's far from over.
Speaker:Amy Qi: You mentioned Canada there at the end.
Speaker:Um, how do you think these cases could be evaluated if they were transposed to a Canadian copyright law context?
Speaker:Jay Kerr-Wilson: So it's interesting because Canada tends to have, under its fair dealing,
Speaker:uh, analysis results that are more user friendly than the United States.
Speaker:So we've had decisions of the Supreme Court of Canada that said,
Speaker:fair dealing isn't just a technical exception to copyright.
Speaker:It's actually a user's right. So this is how fair dealing has been constructed,
Speaker:as Canada and the court has told us that copyright has to be analysed as a balance between the interests of the user,
Speaker:the public interest and the interest of the owner.
Speaker:So although we haven't seen any cases yet, and I think there's lots of good arguments on both sides,
Speaker:my prediction would be that an analysis of a Canadian court would likely be closer to the anthropic,
Speaker:uh, analysis, that it would take a very narrow view of what is the impact of the use or the dealing on the particular
Speaker:work at issue, and not take the broad policy based approach that the court did in the meta case?
Speaker:Amy Qi: That makes sense. Um, just to end off the episode with a more general question,
Speaker:where do we go from here?
Speaker:Jay Kerr-Wilson: So these issues need to be resolved.
Speaker:I think everyone understands that, you know, the generative AI genie's out of the bottle.
Speaker:It's not going back in. Um, all industries are starting to adopt AI solutions and AI technologies,
Speaker:so there needs to be some sort of resolution to these issues,
Speaker:and it's going to have to be a policy based resolution.
Speaker:So ultimately, I think there's going to have to be legislative amendments.
Speaker:We may get some court decisions before that happens.
Speaker:And oftentimes court decisions will inform what governments will do with legislative amendments.
Speaker:The challenge is it takes a long time for these cases to go through the court system,
Speaker:and it takes a long time for governments to pass legislation to amend the Copyright Act to deal with these emerging
Speaker:technologies. And the technology's going really fast.
Speaker:So even if we figured out a way to, you know, resolve the issues that are confronting us today.
Speaker:You know, three years from now, we're probably going to have a whole different set of
Speaker:challenges that will also have to be addressed.
Speaker:Amy Qi: Okay. I think that's a good note to end on.
Speaker:Thank you to Jay for your insight on generative AI and the legal issues associated with it.
Speaker:As we've seen, there's a lot of nuance involved, but I think you really helped clarify the legal debate on
Speaker:the potential copyright infringement involved in AI training data.
Speaker:Thank you for listening to this episode of Perspectives and Copyright,
Speaker:and make sure to tune in for the next one.