The training argument is probably going to come up dry by the time the court works its way through expert testimony, as the underlying argument for training as infringement is insane.
But where OpenAI is probably in hot water is that torrenting 100k books in the first place runs afoul of existing copyright legislation.
Everyone is debating the training in these suits, but the real meat and potatoes is going to be the initial infringement of obtaining the books, not how they were subsequently used.
Authors Guild, Inc. v. Google, Inc. decided that it is fair use to scan books and make large parts of them available verbatim on the net. What AI does is far more transformative than that, as very little of a book can be reproduced verbatim with AI (e.g. popular quotes), you really just get “knowledge” from the books. The sources are however lost in the process, unlike with Google, which by itself however also makes it difficult to argue for copyright violation, since you can’t point at what was actually copied.
Does this fall under fair-use part of copyright?
It hasn’t been tested in court yet but I don’t see why it shouldn’t.
The training argument is probably going to come up dry by the time the court works its way through expert testimony, as the underlying argument for training as infringement is insane.
But where OpenAI is probably in hot water is that torrenting 100k books in the first place runs afoul of existing copyright legislation.
Everyone is debating the training in these suits, but the real meat and potatoes is going to be the initial infringement of obtaining the books, not how they were subsequently used.
Authors Guild, Inc. v. Google, Inc. decided that it is fair use to scan books and make large parts of them available verbatim on the net. What AI does is far more transformative than that, as very little of a book can be reproduced verbatim with AI (e.g. popular quotes), you really just get “knowledge” from the books. The sources are however lost in the process, unlike with Google, which by itself however also makes it difficult to argue for copyright violation, since you can’t point at what was actually copied.