Who Owns the Story? AI, Copyright, and Authors

Rosemi Mederos: 00:01

If you have plot bunnies coming out of your plot holes, it’s time for a writing break.

Hello again. Is your summer off to a good start? I've got publishing news for you today. It's mostly AI-related news, and I'm sorry about that, but it is news that is relevant to writers. I don't think you need to hear about the latest celebrity deals or even that celebrity books are not selling as well as expected. I mean, that last part is true and it might be interesting, but only in a gossipy way, right? The last story I'm reporting on today is a bit gossipy, I guess, but it is a story about how an author and readers can be so easily harmed by people choosing to use AI to profit off of that author's work. It's the kind of story that keeps me from thowing my hands up and going, "Oh, well, AI is here. I guess that's just how it is now." That story is at the end, so stick with me.

Anyway, this AI situation has been thrust upon us, and we musn't bury our heads in the sand, no matter how much we would like to.

Did you know that ostriches do not bury their heads in the sand when they feel threatened? You might not care as much as I do about the world's largest bird, but this myth is a great example of the power of storytelling. There's two reasons this myth got started. First, ostriches do not build nests. Instead they dig holes in the ground for their eggs. Their eggs are the largest eggs of any bird currently living. Each one is about 15 centimeters long and 13 centimeters wide, or 6 by 5 inches, and they weigh about 3 pounds, or 1.4 kilograms. A female ostrich lays about 100 eggs during the breeding season but only 7 to 10 eggs at a time. However, their nests, or holes in the ground, are communal, meaning several ostriches will keep their eggs in the same nest at the same time. So, the holes have to be large enough to hold about 60 eggs at a time. And they are large holes; the holes usually measure around 30 to 60 centimeters deep and 3 meters wide. That's 12 to 24 inches deep and about 9.8 feet wide for those who remain imperialistically imperial.

So, when ostriches tend to their eggs, it takes a while. From far away it looks like they've buried their heads in the sand when really, they're just checking on their eggs and answering the question, "Are you my mommy?" about 60 times. Despite being the fastest running bird alive, with top speeds of 97 kilometers per hour (that's 60 miles per hour) and sustained speeds of over 70 kilometers per hour (or 43 mile per hour), sometimes they don't think they can run away from a predator, so they stretch out on the floor to try to blend in with the terrain. Again, from far away, it looks like they stuck their heads in the sand. They didn't.

But the first recorded mention of ostriches supposedly burying their heads in the sand came from Roman naturalist Pliny the Elder about 2,000 years ago. Imagine someone telling one of your stories 2,000 years from now. Pliny the Elder thought he was speaking the truth, which might be due in part due to the lack of modern optometry. But the ostrich myth has staying power because we humans sometimes willingly close our eyes to what's in front of us, and the ostrich with its head in the sand is such a great visual, you know? Just the uselessness of burying your head when the rest of you is sticking out. And as your editor, it is tempting to go on giving you just writing advice and no AI talk as though the number of books on Amazon has not tripled since the release of ChatGPT. Yes, tripled.

At this year's US Book Show, several publishing CEOs warned the industry against engaging in what they described as an AI "witch hunt." The comment came during a panel discussion about AI and publishing.

Over the past year, we've watched the publishing industry deal with things like the Shy Girl controversy, lawsuits against tech companies, concerns about AI-generated books flooding online marketplaces, and growing demands from authors for greater transparency. So, okay, it's easy to understand why some people have reached a fairly simple conclusion: if AI touched the manuscript at any point, that's a problem.

The publishing executives at the US Book Show earlier this month argued that the situation is more complicated than that.

Consider the tools many writers already use: speech recognition software to avoid repetitive stress injuries from typing; maybe an AI assistant to organize research notes or create a project timeline, things like that. I don't think these uses are on equal footing, but public conversations about AI often put these very different activities into a single category called "AI use."

As one publishing executive noted during the discussion, the industry risks creating an environment in which authors become reluctant to discuss their processes honestly because they fear public backlash.

But writers have good reasons to be concerned. Many of the systems now being incorporated into publishing workflows were trained, at least in part, on copyrighted material obtained without permission. Authors are also watching publishers experiment with AI-assisted translation, audiobook narration, metadata generation, and marketing.

Authorship has always been more collaborative and layered than publishing sometimes acknowledges. Readers care about authenticity, but authenticity has never meant that writers work entirely alone in a cabin somewhere, emerging months later with a perfect manuscript. Writing has always involved tools, collaborators, and intermediaries.

Editors shape manuscripts and get no credit for it. Research assistants contribute material but aren't granted authorship. Copyeditors revise sentences yet are even overlooked in the acknowledgments. Translators hold the copyright to their translations. Ghostwriters produce books that carry someone else's name on the cover.

So, yeah, publishing has never had a particularly clear definition of authorship and collaboration. And with AI it gets even murkier; two authors might say they "used AI," while describing entirely different processes. One maybe asked for title suggestions, while the other generated entire chapters.

So, how much AI involvement should be disclosed? At what point does assistance become authorship? Should publishers establish industry-wide standards? How should contracts address these issues? What expectations do readers have when they purchase a book? We've all got questions, but there are no solid answers yet.

For now, the industry appears to be moving toward a standard based less on whether AI was used and more on how it was used, how extensively it shaped the work, and whether readers and publishers were given an accurate understanding of the creative process.

Artists are in a thicket that others planted around us, planted by stealing from us in the first place. We didn't ask to be here. They are using our own words against us. They've turned our ploughshares into swords. All we can do now is hack away at the thicket until we can make some sense of things. Let's stay informed without turning these swords on each other.

The Writing Break cafe is open, the Overthinking Couch is available, and the latest publishing news awaits.

Five publishers—including Hachette and Macmillan—have filed suit against Meta. The lawsuit alleges that Meta illegally downloaded and copied millions of copyrighted books and journal articles—including works allegedly obtained from notorious pirate sites such as LibGen and Anna's Archive—to train its Llama AI models. The plaintiffs describe the alleged conduct as "one of the most massive infringements of copyrighted material in history."

The publishers claim that Meta knowingly used pirated material to train Llama and, in some instances, removed copyright information from works to obscure where that content came from. The lawsuit even names Mark Zuckerberg personally, alleging that he approved or encouraged some of these decisions.

Meta, unsurprisingly, disagrees.

The company says it intends to fight the lawsuit aggressively--of course they do, they have all the money with which to do so--and the company maintains that training AI models on copyrighted material can qualify as fair use under US copyright law. Meta argues that AI systems create something new and transformative rather than simply reproducing existing works.

We've heard that argument before, and artists have not always won these court cases. I just don't think that current copyright law, which was designed ages before generative AI even existed, can still protect creators in a world where machines can learn from millions of books in a matter of days.

We just can't seem to answer basic questions, like Who benefits from creative labor? Do authors have the right to decide how their work is used? These seem like ethical questions, maybe, but the answer always seems to be whoever has the most money wins. However, a judge did say in a previous case that future plaintiffs should bring stronger evidence of market harm, and the plaintiffs in this new case against Meta say they have such evidence. We shall see.

This lawsuit joins a growing list of legal challenges brought by authors, artists, musicians, and publishers.

Including Eminem, whose publishing company Eight Mile Style was granted permission by a federal judge to sue Meta for allegedly uploading over 240 of the publisher's songs to Facebook, Instagram, and WhatsApp. Eight Mile Style is seeking damages of $150,000 per case of infringement, totaling $109 million. Meta’s lawyers called the lawsuit “fanciful”, which Eminem has never been, and they also called the compensation “eye-popping”, which Eminem has always been. The federal judge rejected Meta’s arguments that the lawsuit was too vague. I mean, yeah, that's just how he talks, right? In the words of Eminem, "Cause what you say is what you say / Say what you say how you say it whenever you sayin' it."

I mentioned Anna's Archive earlier, so what's up with them? A federal judge in New York ruled against the notorious pirate website in a major copyright case brought by a coalition of publishers, including members of the Big Five.

The judge issed a default judgment after the operators of Anna's Archive failed to respond to the lawsuit. The judge ordered the site to immediately stop copying and distributing millions of copyrighted books and journals that publishers say were obtained illegally.

Now, if you're unfamiliar with Anna's Archive, it's what's often called a "shadow library"—a website that provides free access to enormous collections of copyrighted books, academic papers, and other materials without permission from rights holders. Publishers allege that the site contains tens of millions of pirated works. Yikes.

And publishers argue that sites like Anna's Archive are harming authors and publishers through piracy. And, of course, they claim that tech companies use shadow libraries as sources of training data for AI systems.

The legal battle might not be over for Anna's Archive since enforcing judgments against anonymous operators who often move domains and servers across international borders can be challenging.

And publishers have now filed another copyright infringement lawsuit; this time against the online pirate platform WeLib. It's the same thing here, the plaintiffs allege that WeLib has been operating as a massive unauthorized library and distributing copyrighted books without permission from authors or publishers. According to the complaint, the site allegedly offers access to millions of pirated books, including newly released titles.

I do want to say that for years, many publishers treated shadow libraries and pirate sites as an unfortunate but largely unavoidable reality of the digital age. Bad move. Now publishers are finally showing up in court and arguing that pirate sites are undermining book sales and creating vast, centralized repositories of copyrighted works that are being used to train AI systems.

Literary magazine Granta has announced that it will end its relationship with the Commonwealth Short Story Prize following controversy surrounding one of this year's regional winners. There are allegations that artificial intelligence may have been used in the creation of the winning story, but the accused author denied those allegations, explaining that speech-to-text technology was used because of health issues.

For many disabled writers, dictation software and predictive text are accessibility tools, and before LLMs, using these tools were not challenged. But, as AI capabilities expand, the line between accommodation and authorship becomes increasingly difficult to define.

So, when does assistive technology become co-authorship? Now, I don't know the accused author, I haven't read the winning story in question, and I don't know how much AI was or wasn't used. If it is just a matter of speech-to-text technology being used, then this might just be a story about sore losers trying to move the goal post. But given that Granta has pulled out of the short story prize, maybe there's something to these allegations. I don't know.

On a call with lawyers at the Authors Guild, we discusssed authors being asked to prove their writing was not AI generated. One of the attorneys mentioned that there are programs and websites that will issue a certification of human authorship if you type your work in their program. The Authors Guild and their lawyers are still looking into programs like these, and so far they haven't come out and endorsed the use of any. But they might eventually.

I checked some of these programs out, and I just do not like the thought of authors having to live in this state of perpetual proof. And are the sites even secure?

One author I know emails a copy of her manuscript to herself every time she works on it. I used to think that was good enough, but now you also have to prove that the text in your document isn't just a collection of generative AI outputs that you copied and pasted. It's demoralizing. I don't see it as a compliment. They're not saying your writing is good enough to be AI; they're saying it's bad enough to be AI. Ironically, they're also saying it's good enough to publish if you did indeed write it.

I am certain that this is going to be an ongoing debate in publishing for a while, and I will keep you up to date.

So, whether or not you use AI in your work, the way things stand now, someone can come along after you and profit off of your work to the point that they end up outranking you on the internet.

Andy Baio recently published an investigation on Waxy.org titled “The Wholesale Plagiarism of Obscure Sorrows,” and it centers on John Koenig’s book and long-running creative project, The Dictionary of Obscure Sorrows.

Koenig’s project began in: 2009

Then a second website appeared.

At first glance, it looks like a polished official site for the book. It has an author bio and press mentions. The domain is close enough to the original that a casual visitor could easily assume it's legitimate.

But the site was not created by Koenig.

Even though there are purchase links for the book, the site also contains the entire text of the book, including the foreword, hundreds of invented words, definitions, etymologies, and essays. Koenig’s original artwork has been replaced with AI-generated images. Yuck. The site also added a feature inviting visitors to generate their own “sorrows” using AI.

Baio contacted Koenig directly. Koenig replied that he had nothing to do with the site.

Instead, a marketing and web design agency built a slick website around Koenig's book, republished the book’s full contents, added AI features, used AI-generated art, and then presented the project as part of its portfolio. The agency proudly states that all of its web pages were written using Claude and a personality they created on Claude that they call Qontour or Q.

In one place, the agency's site about "The Dictionary of Obscure Sorrows" acknowledges that the dictionary content belongs to Koenig. In another place on the site, the agency appeared to claim a Creative Commons license over “The Dictionary of Obscure Sorrows by Qontour,” even though it did not own the underlying material. Baio’s point is straightforward: you cannot relicense work you do not own.

The site also used Amazon affiliate links. That means the agency earns commissions from book sales driven through the unofficial site. And now, the unofficial site outranks Koenig’s official site, the publisher’s page, and Wikipedia for searches related to the book, the title, and even Koenig’s name. AI search tools identified the impostor site as the official website and attributed it to Koenig.

So this site is, for many searchers, the version of the project they will encounter first. That's just awful. Reputation is part of an author’s work. If a reader arrives at an unauthorized site filled with AI-generated art and AI word-generation tools, that reader might assume the author endorsed those choices. In this case, this site has people wondering if Koenig embraced AI or whether the original book was AI-generated.

That's not good. It takes years to build a body of work with a distinctive voice. It can take one polished impostor site to blur the public record.

This story shows why the current AI conversation can feel so personal to writers, and it certainly feels personal to me on behalf of all of the amazingly talented authors I've worked with over the years. We already had the problem of chatbots generating book summaries, usually with misinformation. But now we have to worry about agencies and opportunists using AI to repackage human work, optimize it for search, decorate it with synthetic images, and siphon attention away from the person who made the original thing.

Simon & Schuster reportedly filed DMCA takedown notices with Google last year for two pages from the bootleg site, but those efforts did not meaningfully reduce the site’s reach.

So, the author has the original project. The publisher has the legitimate book. But this unauthorized site has search visibility. Remember how often we've talked about the discoverability problem books have? Yeah, this isn't helping.

So, protecting your work these days means owning the copyright plus monitoring how your work appears in searches, how AI systems describe it, and whether unofficial sources are being mistaken for the real thing.

Baio argues that this represents the problem facing creators in the age of generative AI. The systems appear capable of absorbing highly distinctive creative work and reproducing it at scale. Writers have always influenced one another. Literary history is full of examples of authors borrowing forms, responding to predecessors, and adapting conventions. Most writers can point to a shelf full of books that shaped their own work.

A human writer might read The Dictionary of Obscure Sorrows, absorb its sensibility, and years later write something that bears traces of Koenig's influence, but an AI system can provide thousands of users with instant access to a close approximation of Koenig's creative project without those users ever encountering the original work or purchasing the book.

The story also exposes a challenge that current copyright law struggles to address.

Copyright protects specific expressions, but it generally does not protect ideas, styles, or concepts. Writers understand this. No one owns the enemies-to-lovers trope or the hard-boiled detective. Yet with Koenig's project, the invented words, their emotional resonance, and the distinctive sensibility of the work are so closely intertwined that separating idea from expression becomes difficult.

Users can ask AI systems to generate obscure emotions in the style of The Dictionary of Obscure Sorrows and receive responses that closely resemble Koenig's work. In some cases, the systems reproduce definitions almost verbatim. In others, they generate new entries that draw heavily on the structure, tone, and linguistic conventions that Koenig developed over many years.

Like I said, the most famous word to come from The Dictionary of Obscure Sorrows is sonder, which means the realization that each passerby is living a life as vivid and complex as your own. But let me read the entire definition so you can get a sense of the author's voice and the beauty of this book:

"sonder n. the realization that each random passerby is living a life as vivid and complex as your own—populated with their own ambitions, friends, routines, worries and inherited craziness—an epic story that continues invisibly around you like an anthill sprawling deep underground, with elaborate passageways to thousands of other lives that you’ll never know existed, in which you might appear only once, as an extra sipping coffee in the background, as a blur of traffic passing on the highway, as a lighted window at dusk."

Now that is an obscure sorrow. The beauty of Koenig's original work is his voice. Many writers spend decades developing a voice. We talk about finding our voice as though it were something waiting to be discovered, but voice is constructed slowly through thousands of decisions made over years of work. It becomes one of the few things in publishing that genuinely belongs to us.

Now you're telling me a system can absorb that voice and distribute approximations of it to millions of people, and we're just supposed to go, "oh, cool." Come on. If you don't get it yet, I love authors so much. This is heartbreaking.

Links to all of today's news stories can be found in the show notes of this episode. I encourage you to read Baio's investigation on Waxy.org, especially if you have ever built a distinctive body of work online and assumed that being the original creator would be enough to keep you at the center of your own project.

Next week we're going to discuss writing for the screen, the stage, and video games. Whether you're starting off writing for these or thinking you might want to adapt your story one day, I am looking forward to being with you again soon for another Writing Break.

Until then, thank you so much for listening, and remember, you deserved this break.

Thank you for making space in your mind for The Muse today.

Writing Break is hosted by America’s Editor and produced by Allon Media with technical direction by Gus Aviles. Visit us at writingbreak.com or contact us at [email protected].

Share Episode

Shownotes

🛋️

Transcripts

Follow

Links

Chapters

Video

More from YouTube