Did anyone expect them to go “oh, okay, that makes sense after all”?
At the crux of the author’s lawsuit is the argument that OpenAI is ruthlessly mining their material to create “derivative works” that will “replace the very writings it copied.”
The authors shoot down OpenAI’s excuse that “substantial similarity is a mandatory feature of all copyright-infringement claims,” calling it “flat wrong.”
Goodbye Star Wars, Avatar, Tarantino’s entire filmography, every slasher film since 1974…
Uh, yeah, a massive corporation sucking up all intellectual property to milk it is not the own you think it is.
But this is literally people trying to strengthen copyright and its scope. The corporation is, out of pure convenience, using copyright as it exists currently with the current freedoms applied to artists.
Listen, it’s pretty simple. Copyright was made to protect creators on initial introduction to market. In modern times it’s good if an artist has one lifetime, i.e their lifetime of royalties, so that they can at least make a little something - because for the small artist that little something means food on their plate.
But a company, sitting on a Smaug’s hill worth of intellectual property, “forever less a day”? Now that’s bonkers.
But you, scraping my artwork to resell for pennies on the dollar via some stock material portal? Can I maybe crawl up your colon with sharp objects and kindling to set up a fire? Pretty please? Oh pretty please!
Also, if you AI copies my writing style, I will personally find you, rip open your skull AND EAT YOUR BRAINS WITH A SPOON!!! Got it, devboy?
Won’t be Mr Hotshot with a pointy objects and a fire up you ass, as well as less than half a brain… even though I just took a couple of bites.
Chew on that one.
EDIT: the creative writer is doomed, I tells ya! DOOOOOOMED!
This is remarkably aggressive and assumptive. It also addresses none of my beliefs substantively so not much to really chew on there.
You let me know if you ever want to chat about the issue, but right now it looks like you just want to vent. Feel free to do that but I’m not going to just be an object of your anger.
Actually, I hid all that in the goop - but it went passed ya because you didn’t want to read with your minds eye. You also weren’t showing any sympathy, but false sympathy, because you just wanted to dismiss the person and not their concern. This is called an “as hominem”. You argue the person rather than the points, and they did substantially tell you that it’s a matter of being able to be paid for your labours.
A little hint: copyright is there to protect creators from over reach, which should be fairly obvious. Both mass consolidation of intellectual property and fuzzing of copyright through AI is also abuse of the very founding principles of copyright.
But sometimes people just want to dismiss, it becomes easier if someone is upset, cus then your can take the high road about stuff and people will be happy with it…
Ok, byyyyye~ _
Is actually reminds me of a Sci-Fi I read where in the future, they use an ai to scan any new work in order to see what intellectual property is the big Corporation Zone that may have been used as an influence in order to Halt the production of any new media not tied to a pre-existing IP including 100% of independent and fan-made works.
Which is one of the contributing factors towards the apocalypse. So 500 years later after the apocalypse has been reversed and human colonies are enjoying post scarcity, one of the biggest fads is rediscovering the 20th century, now that all the copyrights expired in people can datamine the ruins of Earth to find all the media that couldn’t be properly preserved heading into Armageddon thanks to copyright trolling.
It’s referred to in universe as “Twencen”
The series is called FreeRIDErs if anyone is curious, unfortunately the series may never have a conclusion, (untimely death of co creator) most of its story arcs were finished so there’s still a good chunk of meat to chew through and I highly recommend it.
OpenAI is trying to argue that the whole work has to be similar to infringe, but that’s never been true. You can write a novel and infringe on page 302 and that’s a copyright infringement. OpenAI is trying to change the meaning of copyright otherwise, the output of their model is oozing with various infringements.
Speaking of slasher films, does anybody know of any movies that have terrible everything except a really good plot?
The Godfather Part III
I don’t care what works a neural network gets trained on. How else are we supposed to make one?
Should I care more about modern eternal copyright bullshit? I’d feel more nuance if everything a few decades old was public-domain, like it’s fucking supposed to be. Then there’d be plenty of slightly-outdated content to shovel into these statistical analysis engines. But there’s not. So fuck it: show the model absolutely everything, and the impact of each work becomes vanishingly small.
Models don’t get bigger as you add more stuff. Training only twiddles the numbers in each layer. There are two-gigabyte networks that have been trained on hundreds of millions of images. If you tried to store those image, verbatim, they would each weigh barely a dozen bytes. And the network gets better as that number goes down.
The entire point is to force the distillation of high-level concepts from raw data. We’ve tried doing it the smart way and we suck at it. “AI winter” and “good old-fashioned AI” were half a century of fumbling toward the acceptance that we don’t understand how intelligence works. This brute-force approach isn’t chosen for cost or ease or simplicity. This is the only approach that works.
Models don’t get bigger as you add more stuff.
They will get less coherent and/or “forget” the earlier data if you don’t increase the parameters with the training set.
There are two-gigabyte networks that have been trained on hundreds of millions of images
You can take a huge tiff of an image, put it through JPEG with the quality cranked all the way down and get a tiny file out the other side, which is still a recognizable derivative of the original. LLMs are extremely lossy compression of their training set.
which is still a recognizable derivative of the original
Not in twelve bytes.
Deep models are a statistical distillation of a metric shitload of data. Smaller models with more training on more data don’t get worse, they get more abstract - and in adversarial uses they often kick big networks’ asses.
Which is why we shouldn’t be using something we don’t and can’t use properly.
Right, copyright law.
No this will benefit capitalism and wealthiest people the most. The rest of us will suffer because of this. People can only think of the positives of AI and never the negatives this is weed all over again.
Motivation to discuss anything with you goes flying out the window, if you think ending marijuana prohibition is anything but positive for the common people. And you’re going to drop that turd in a completely unrelated punchbowl.
Marijuana is always characterized as positives and people always forget the negatives in every conversation. This is the exact same shit. Weed shouldn’t even be illegal but those dumb racist white men in the 60s-80s with their paranoia decided to outlaw it. Fuck the exact doctors and psychologists that “analyzed” it said everything was bullshit so they had a professional you dumbass too. I’m not getting into racist history with you but take my first sentence as the argument.
Talk less.
I should the average human is stupid.
I think the place we haven’t quite gotten to yet is that copyright is probably the wrong law for this. What the AI is doing is reverse engineering the authors magic formula for creating new works, which would likely be patent law.
In the past this hasn’t really been possible for a person to do reliably, and it isn’t really quantifiable as far as filling a patent for your process, yet the AI does it anyway, leaving us in a weird spot.
US patent professional here
Ya, saying it isn’t possible to do under patent law is no understatement. Even making the patent applications possible to allow would require changes to 35 U. S. C. 112 (A, and probably also B), 35 U. S. C. 101. This all assumes that all authors would have the time and money and energy to file a patent, which even with a good attorney is analogous to is many many hours of work and filing pro se would be like writing a whole new book. After the patent is allowed the costs of continuation applications to account for changes in the process as the author learns and grows would be a hellish burden. After this comes the 20 year lifespan of a patent (assuming all maintenance fees are paid, which is quite the assumption, those are not cheap) at which point the patent protections are dead and the author needs to invent a new process to be protected. Don’t even get me started on enforcing a patent.
Patent law is fundamentally flawed to be sure but even if every author gets infinite money and time to file patents with then the changes needed to patent law to let them do so would leave patent law utterly broken for other purposes.
Using patent law for this is a good idea to bring up but for the above reasons I don’t think it is viable at all. It would be better and more realistic to have congress change copyright law than to change patent law I think. Sadly, I don’t think that is particularly likely either. :(
Can’t they just create a brand new law, specifically to cover these use-cases we have here?
Edit: And thank you for your detailed answer! It was educational.
They could and imho (I’m not an expert on this) they probably should. This would fall under unfucking copyright though, or perhaps under a new thing along side copyright and patent law (though that sounds like more work than updating copyright law). Amending it into patent law would be the toughest option. The simple answer as to why I think that is that the vibes are off.
As a rough analogy it would be like combating public flashers by changing the rules for the department of transportation rather than the criminal justice system (ignoring how fucked the criminal justice system is).
And patent law is even more broken than copyright law.
I don’t know if I would say more broken, at least patents have limits on how long they can exist for, putting an upper bound on how much damage they can cause. The again, limiting the production of vaccines during a pandemic is a lot more urgent than letting people do micky mouse cartoons so the standard for what broken is has to be a lot more stringent. It is more important for patent law to not be broken than it is for copyright law so the same amount of brokenness feels worse with patents.
seethe
Very concerning word use from you.
The issue art faces isn’t that there’s not enough throughput, but rather there’s not enough time, both to make them and enjoy them.
That’s always been the case, though, imo. People had to make time for art. They had to go to galleries, see plays and listen to music. To me it’s about the fair promotion of art, and the ability for the art enjoyer to find art that they themselves enjoy rather than what some business model requires of them, and the ability for art creators to find a niche and to be able to work on their art as much as they would want to.
Headline is stupid.
Millenails journalism is fucking got to stop with these clown word choices…
Honestly it’s refreshing to not see the word “slammed” for once…
Let the boys be boys
Haha… This person gets it.
Copyright is already just a band-aid for what is really an issue of resource allocation.
If writers and artists weren’t at risk of loosing their means of living, we wouldn’t need to concern ourselves with the threat of an advanced tool supplanting them. Nevermind how the tool is created, it is clearly very valuable (otherwise it would not represent such a large threat to writers) and should be made as broadly available (and jointly-owned and controlled) as possible. By expanding copyright like this, all we’re doing is gatekeeping the creation of AI models to the largest of tech companies, and making them prohibitively expensive to train for smaller applications.
If LLM’s are truly the start of a “fourth industrial revolution” as some have claimed, then we need to consider the possibility that our economic arrangement is ill-suited for the kind of productivity it is said AI will bring. Private ownership (over creative works, and over AI models, and over data) is getting in the way of what could be a beautiful technological advancement that benefits everyone.
Instead, we’re left squabbling over who gets to own what and how.
I take it we don’t use the phrase “good writers borrow, great writers steal” in this day and age…
Here’s current guidance from US Congress regarding AI copyright infringement.
Page 3 includes guidance on fair use.
deleted by creator
Wah. Waaaah. Cry more rich people.
Amazing how every new generation of technology has a generation of users of the previous technology who do whatever they can do stop its advancement. This technology takes human creativity and output to a whole new level, it will advance medicine and science in ways that are difficult to even imagine, it will provide personalized educational tutoring to every student regardless of income, and these people are worried about the technicality of what the AI is trained on and often don’t even understand enough about AI to even make an argument about it. If people like this win, whatever country’s legal system they win in will not see the benefits that AI can bring. That society is shooting themselves in the foot.
Your favorite musician listened to music that inspired them when they made their songs. Listening to other people’s music taught them how to make music. They paid for the music (or somebody did via licensing fees or it was freely available for some other reason) when they listened to it in the first place. When they sold records, they didn’t have to pay the artist of every song they ever listened to. That would be ludicrous. An AI shouldn’t have to pay you because it read your book and millions like it to learn how to read and write.
I don’t think that Sarah Silverman and the others are saying that the tech shouldn’t exist. They’re saying that the input to train them needs to be negotiated as a society. And the businesses also care about the input to train them because it affects the performance of the LLMs. If we do allow licensing, watermarking, data cleanup, synthetic data, etc. in a way that is transparent, I think it’s good for the industry and it’s good for the people.
I don’t need to negotiate with Sarah Silverman if Im handed her book by a friend, and neither should an AI
But you do need to negotiate with Sarah Silverman, if you take that book, rearrange the chapters, and then try sell it for profit. Obviously that’s extremified but it’s The argument they’re making.
I agree. But that isn’t what AI is doing, because it doesn’t store the actual book and it isn’t possible to reproduce any part in a format that is recognizable as the original work.
Definitely not how that output works. It will come up with something that seems like a Sarah Silverman created work but isn’t. It’s like calling Copyright on impersonations. I don’t buy it
Yes. Imagine how much trouble ANY actor would be in if they were sued for impersonating someone nearly identical but not that person. If Sarah Silverman ever interacted with a person and then imitated that person on stage for her own personal benefit without the other persons express consent it would be no different. And comedians pick up their comedy from everything around them both natural and imitation.
100%. I just can’t get behind any of these arguments against AI from this segment of workers. This is no different than other rallies against technological evolution due to fear of job losses. Their scarce commodity will soon disappear and that’s what they’re actually afraid of.
It’s easy. They’re grasping at straws because their career isn’t what it used to be. It’s something new and viral so it must be an easy target to exploit for money. Personally I’d be on top of it and setting up contracts to allow AI to use my likeness for a small subset of the usual pay. I just can’t imagine not taking advantage of the ability to do absolutely nothing and still get paid for it. Instead they appear to actively be trying to tear it down. If they were wanting to set guidelines then they would be rallying congress not suing a company based on how you FEEL it should be.
That’s not what this is. To use your example it would be like taking her book and rearranging ALL of the words to make another book and selling that book. But they’re not selling the book or its contents, they’re selling how their software interprets the book for the benefit of the user. This would be like suing teachers for teaching about their book.
An LLM isn’t human and shouldn’t be treated the same as a human. It’s as foolish as corporate personhood.
The argument is less that an LLM is a human and more that it is not a copyright violation to use a material to train the LLM. By current legal definitions, it is fair use unless the material is able to be reproduced in its entirety (or at least, in some meaningful way).
By current legal definitions
Yeah, definitions that were written before this technology existed. I don’t base my opinions on what is legal, legality nothing more than rules determined by those in power.
Instead, I base them on what is ethical, and the consumption of material by LLMs and other AIs without the express permission of its creator is unethical.
Except the AI owner does. It’s like sampling music for a remix or integrating that sample into a new work. Yes, you do not need to negotiate with Sarah Silverman if you are handed a book by a friend. However if you use material from that book in a work it needs to be cited. If you create an IP based off that work, Sarah Silverman deserves compensation because you used material from her work.
No different with AI. If the AI used intellectual property from an author in its learning algorithm, than if that intellectual property is used in the AI’s output the original author is due compensation under certain circumstances.
Neither citation nor compensation are necessary for fair use, which is what occurs when an original work is used for its concepts but not reproduced.
Sure, but fair use is rather narrowly defined. You must consider the purpose, nature, amount, and effect. In the case of scraping entire bodies of work as training data, the purpose is commercial, the nature is not in the public interest, the amount is the work in its entirety, and the effect is to compete with the original author. It fails to meet any criteria for fair use.
The work is not reproduced in its entirety. Simply using the work in its entirety is not a violation of copyright law, just as reading a book or watching a movie (even if pirated) is not a violation. The reproduction of that work is the violation, and LLMs simply do not store the works in their entirety nor are they capable of reproducing them.
It doesn’t have to be reproduced to be a copyright violation, only used. For example, publishing your Harry Potter fanfic would be infringement. You’re not reproducing the original material in any way, but you’re still heavily depending on it.
It is different. That knowledge from her book forms part of your processing and allows you to extract features and implement similar outputs yourself. The key difference between the AI module and dataset is that it’s codified in bits, versus whatever neural links we have in our brain. So if one theoretically creates a way to codify your neural network you might be subject to the same restrictions we’re trying to levy on ai. And that’s bullshit.
its a bit more than that if the ai is told to make something in the style of.
I mean people have doing new works in the style of other artists for a while as well.
yeah again they can’t crank out a new one every 5 minutes and actually it would overwhelm the courts as its very easy for those works to be to similar. take the guy who tried to sue disney by writing a book based on finding nemo when he found out they were making a story like that. He was shady and tried to play timeline games but he did not need to make a story just like it.
Yeah, and a person could make something in the style of someone else. And it would only be copyright infringement if the work does not meaningfully change the original and give credit to the original artist.
How is this any different?
mainly because its just to easy. We should limit time periods for ip but while its in force it should not be able to be used by ai to me. Keep ip to 20 years and let ai have it at that point.
This technology takes human creativity and output to a whole new level,
No, it doesn’t. There’s nothing “human” or “creative” about the output of AI.