Google says AI systems should be able to mine publishers’ work unless companies opt out, turning copyright law on its head

@0x815@feddit.de · 2 years ago

Google says AI systems should be able to mine publishers’ work unless companies opt out, turning copyright law on its head

@ConsciousCode@beehaw.org · 2 years ago

To be honest I’m fine with it in isolation, copyright is bullshit and the internet is a quasi-socialist utopia where information (an infinitely-copyable resource which thus has infinite supply and 0 value under capitalist economics) is free and humanity can collaborate as a species. The problem becomes that companies like Google are parasites that take and don’t give back, or even make life actively worse for everyone else. The demand for compensation isn’t so much because people deserve compensation for IP per se, it’s an implicit understanding of the inherent unfairness of Google claiming ownership of other people’s information while hoarding it and the wealth it generates with no compensation for the people who actually made that wealth. “If you’re going to steal from us, at least pay us a fraction of the wealth like a normal capitalist”.

If they made the models open source then it’d at least be debatable, though still suss since there’s a huge push for companies to replace all cognitive labor with AI whether or not it’s even ready for that (which itself is only a problem insofar as people need to work to live, professionally created media is art insofar as humans make it for a purpose but corporations only care about it as media/content so AI fits the bill perfectly). Corporations are artificial metaintelligences with misaligned terminal goals so this is a match made in superhell. There’s a nonzero chance corporations might actually replace all human employees and even shareholders and just become their own version of skynet.

Really what I’m saying is we should eat the rich, burn down the googleplex, and take back the means of production.

@superkret@feddit.de · edit-2 2 years ago

deleted by creator

@ConsciousCode@beehaw.org · 2 years ago

That’s fair, also congratulations. Idk if I would count that towards contributing to the internet though, since it’s all within their walled garden on their own terms. It’s helpful for people, but only insofar as it helps Google. 10 years ago I might be less critical since they were still in their “don’t be evil” phase and creating open source projects like Android left and right, something they’re evidently regretting now and trying to lock down using propriety core apps. It’s also worth noting Google’s AI employees authored “Attention is all you need”, the paper which laid the groundwork for modern Transformer-based LLMs, though that’s an architecture and not a full model or code.

@cambriakilgannon@beehaw.org · 2 years ago

Or, if it was some non-profit doing the work for the good of everyone :')

@ConsciousCode@beehaw.org · 2 years ago

If only there were some kind of open AI research lab lmao. In all seriousness Anthropic is pretty close to that, though it appears to be a public benefit corporation rather than a nonprofit. Luckily the open source community in general is really picking up the slack even without a centralized organization, I wouldn’t be surprised if we get something like the Linux Foundation eventually.

modulus · 2 years ago

Worth considering that this is already the law in the EU. Specifically, the Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Market has exceptions for text and data mining.

Article 3 has a very broad exception for scientific research: “Member States shall provide for an exception to the rights provided for in Article 5(a) and Article 7(1) of Directive 96/9/EC, Article 2 of Directive 2001/29/EC, and Article 15(1) of this Directive for reproductions and extractions made by research organisations and cultural heritage institutions in order to carry out, for the purposes of scientific research, text and data mining of works or other subject matter to which they have lawful access.” There is no opt-out clause to this.

Article 4 has a narrower exception for text and data mining in general: “Member States shall provide for an exception or limitation to the rights provided for in Article 5(a) and Article 7(1) of Directive 96/9/EC, Article 2 of Directive 2001/29/EC, Article 4(1)(a) and (b) of Directive 2009/24/EC and Article 15(1) of this Directive for reproductions and extractions of lawfully accessible works and other subject matter for the purposes of text and data mining.” This one’s narrower because it also provides that, “The exception or limitation provided for in paragraph 1 shall apply on condition that the use of works and other subject matter referred to in that paragraph has not been expressly reserved by their rightholders in an appropriate manner, such as machine-readable means in the case of content made publicly available online.”

So, effectively, this means scientific research can data mine freely without rights’ holders being able to opt out, and other uses for data mining such as commercial applications can data mine provided there has not been an opt out through machine-readable means.

frog 🐸 · 2 years ago

I think the key problem with a lot of the models right now is that they were developed for “research”, without the rights holders having the option to opt out when the models were switched to for-profit. The portfolio and gallery websites, from which the bulk of the artwork came from, didn’t even have opt out options until a couple of months ago. Artists were therefore considered to have opted in to their work being used commercially because they were never presented with the option to opt out.

So at the bare minimum, a mechanism needs to be provided for retroactively removing works that would have been opted out of commercial usage if the option had been available and the rights holders had been informed about the commercial intentions of the project. I would favour a complete rebuild of the models that only draws from works that are either in the public domain or whose rights holders have explicitly opted in to their work being used for commercial models.

Basically, you can’t deny rights’ holders an ability to opt out, and then say “hey, it’s not our fault that you didn’t opt out, now we can use your stuff to profit ourselves”.

@Pseu@beehaw.org · 2 years ago

So at the bare minimum, a mechanism needs to be provided for retroactively removing works that would have been opted out of commercial usage if the option had been available and the rights holders had been informed about the commercial intentions of the project.

If you do this, you limit access to AI tools exclusively to big companies. They already employ enough artists to create a useful AI generator, they’ll simply add that the artist agrees for their work to be used in training to the employment contract. After a while, the only people who have access to reasonably good AI is are those major corporations, and they’ll leverage that to depress wages and control employees.

The WGA’s idea that the direct output of an AI is uncopyrightable doesn’t distort things so heavily in favor of Disney and Hasbro. It’s also more legally actionable. You don’t name Microsoft Word as the editor of a novel because you used spell check even if it corrected the spelling and grammar of every word. Naturally you don’t name generative AI as an author or creator.

Though the above argument only really applies when you have strong unions willing to fight for workers, and with how gutted they are in the US, I don’t think that will be the standard.

frog 🐸 · edit-2 2 years ago

The solution to only big companies having access to AI by using enough artists to create a useful generator isn’t to deny all artists globally any ability to control their work, though. If all works can be scraped and added to commercial AI models without any payment to artists, you completely obliterate all artists except for the small handful working for Disney, Hasbro, and the likes.

AI models actually require a constant input of new human-made artworks, because they cannot create anything new or unique themselves, and feeding an AI content produced by AI ends up with very distorted results pretty quickly. So it’s simply not viable to expect the 99% of artists who don’t work for big companies to continuously provide new works for AI models, for free, so that others can profit from them. Therefore, artists need either the ability to opt out or they need to be paid.

(The word “artist” here is used to refer to everyone in the creative industries. Writing and music are art just like paintings and drawings are.)

@Pseu@beehaw.org · 2 years ago

Unfortunately, copyright protection doesn’t extend that far. AI training is almost certainly fair use if it is copying at all. Styles and the like cannot be copyrighted, so even if an AI creates a work in the style of someone else, it is extremely unlikely that the output would be so similar as to be in violation of copyright. Though I do feel that it is unethical to intentionally try to reproduce someone’s style, especially if you’re doing it for commercial gain. But that is not illegal unless you try to say that you are that artist.

https://www.eff.org/deeplinks/2023/04/how-we-think-about-copyright-and-ai-art-0

frog 🐸 · 2 years ago

Copyright law on this varies, actually! In the UK, “fair dealing” actually has an exclusion for using copyrighted material for the purpose of commercially competing with the creator. This also includes derivative works. This does therefore cover style to a certain extent, because works imitating a style of an artist are generally intended to commercially compete with them. From that perspective, taking an artist’s entire portfolio, feeding it into an AI, and producing work in their style at a lower price than the artist does (because an AI produces something in seconds which takes the artist weeks), is pretty obviously an attempt to compete with the artist commercially.

While people like to draw comparisons between AIs and humans copying another artist’s style, the big difference here is that a human artist needs to spend hundreds of hours learning to imitate another artist’s style, at the expense of developing their own style, while the original artist is also continually developing their style. It is bloody hard to imitate another human’s art style. But an AI can do it in minutes, and I haven’t yet seen any valid arguments for how that’s not intended to commercially compete with human artists on a massive scale.

@Pseu@beehaw.org · 2 years ago

True, I wrote this from a US law perspective, where that kind of behavior is expressly protected. US law is also written specifically to protect things like search engines and aggregators to prevent services like Google from getting sued for their blurbs, but it’s likely also a defense for AI.

Regardless of if it should be illegal or not, I feel that AI training and use is currently legal under current US law. And as a US company, dragging OpenAI to UK courts and extracting payment from them would be difficult for all but the most monied artists.

frog 🐸 · 2 years ago

For the moment, US companies do actually care what the UK courts and regulatory bodies say, because the trifecta of US-UK-EU is what tends to form a base of what the rest of the world decides. It’s why Microsoft have been so unhappy about the UK’s Competition and Markets Authority initially blocking the merger with Blizzard: even with the US and EU antitrust bodies agreeing to it, it did actually matter if the UK didn’t agree (I am so disappointed in the CMA finally capitulating). And some of the lawsuits against the AI companies are taking place in the UK courts, with no indications that the AI companies are refusing to engage. Obviously at this point it’s hard to say what the outcome will be, but the UK legal system does actually have enough clout globally that it won’t be a meaningless result.

@P1r4nha@feddit.de · 2 years ago

Practically you would have to separate model architecture from weights. Weights are licensed as research use only, while the architecture is the actual scientific contribution. Maybe some instructions on best train the model.

Only problem is that you can’t really prove if someone just retrained research weights or trained from scratch using randomized weights. Also certain alterations to the architecture are possible, so only the “headless” models are used.

I think there’s some research into detecting retraining, but I can imagine it’s not fool proof.

frog 🐸 · 2 years ago

I kind of think that as proof-of-concepts, the AI models are kind of interesting. I don’t like the content they produce much, because it is just so utterly same-y, so I haven’t yet seen anything that made me go “wow, that’s amazing”. But the actual architecture behind them is pretty cool.

But at this point, they’ve gone beyond researching an interesting idea into full on commercial enterprises. If we don’t have an effective means of retraining the existing models to remove the data that isn’t licenced for commercial use (which is most of it), then it seems the only ethical way to move forward would be to start again with more selective training data, including only what is commercially licenced. Now the research has been done in how to create these models, it should be quicker to build new ones with more ethically sourced training data.

SokathHisEyesOpen · 2 years ago

The standard needs to be opt-in, not opt-out. You can’t take people’s stuff without their permission. Just because they didn’t contact you and tell you directly that you’re not allowed to take their lawn ornaments doesn’t make them free.

modulus · 2 years ago

Why not? Copyright is a monopoly. Generally society benefits from having it as weak as possible.

@LastOneStanding@beehaw.org · 2 years ago

OK, so I shall create a new thread, because I was harassed. Why bother publishing anything if it’s original if it’s just going to be subsumed by these corporations? Why bother being an original human being with thoughts to share that are significant to the world if, in the end, they’re just something to be sucked up and exploited? I’m pretty smart. Keeping my thoughts to myself.

@Kwakigra@beehaw.org · 2 years ago

This is a tendency I’ve heard that I haven’t been able to understand. What is the new risk of expressing your thoughts, prose, or poetry online that didn’t exist before and currently exists with LLMs scraping them? How would the corporations exploit your work through data scraping that would demotivate you to express it at all? Because I know tone doesn’t come accross well in text, I want to clarify that these are genuine questions because my answers to these questions seem to be very different than many and I’d like to understand where that difference in perspective comes from.

@EgoNo4@beehaw.org · 2 years ago

Google can go suck on a lemon!

SokathHisEyesOpen · edit-2 2 years ago

This is like the beginning of a Hitchhiker’s Guide to the Galaxy, where they put the responsibility on the main character to go to the department of transportation basement and see that they had posted a notice that they’re going to destroy his house. No Google, you don’t get to dictate that people come to your dark pattern website and tell you you’re not allowed to use their content. Disapproval is implied until people OPT-IN! It’s a good thing Google changed their motto from Don’t Be Evil or we’d have quite the conundrum.

FaceDeer · 2 years ago

Copyright law already allows generative AI systems to scrape the internet. You need to change the law to forbid something, it isn’t forbidden by default. Currently, if something is published publicly then it can be read and learned from by anyone (or anything) that can see it. Copyright law only prevents making copies of it, which a large language model does not do when trained on it.

@lostmypasswordanew@feddit.de · 2 years ago

An AI model is a derivative work of its training data and thus a copyright violation if the training data is copyrighted.

FaceDeer · 2 years ago

It is not a derivative work, the model does not contain any recognizable part of the original material that it was trained on.

frog 🐸 · 2 years ago

Except when it produces exact copies of existing works, or when it includes a recognisable signature or watermark?

NumbersCanBeFun · 2 years ago

deleted by creator

frog 🐸 · 2 years ago

The point is that if the model doesn’t contain any recognisable parts of the original material it was trained on, how can it reproduce recognisable parts of the original material it was trained on?

@ricecake@beehaw.org · 2 years ago

That’s sorta the point of it.
I can recreate the phrase “apple pie” in any number of styles and fonts using my hands and a writing tool. Would you say that I “contain” the phrase “apple pie”? Where is the letter ‘p’ in my brain?

Specifically, the AI contains the relationship between sets of words, and sets of relationships between lines, contrasts and colors.
From there, it knows how to take a set of words, and make an image that proportionally replicates those line pattern and color relationships.

You can probably replicate the Getty images watermark close enough for it to be recognizable, but you don’t contain a copy of it in the sense that people typically mean.
Likewise, because you can recognize the artist who produced a piece, you contain an awareness of that same relationship between color, contrast and line that the AI does. I could show you a Picasso you were unfamiliar with, and you’d likely know it was him based on the style.
You’ve been “trained” on his works, so you have internalized many of the key markers of his style. That doesn’t mean you “contain” his works.

@BlameThePeacock@lemmy.ca · 2 years ago

A human is a derivative work of its training data, thus a copyright violation if the training data is copyrighted.

The difference between a human and ai is getting much smaller all the time. The training process is essentially the same at this point, show them a bunch of examples and then have them practice and provide feedback.

If that human is trained to draw on Disney art, then goes on to create similar style art for sale that isn’t a copyright infringement. Nor should it be.

Phanatik · 2 years ago

This is stupid and I’ll tell you why.
As humans, we have a perception filter. This filter is unique to every individual because it’s fed by our experiences and emotions. Artists make great use of this by producing art which leverages their view of the world, it’s why Van Gogh or Picasso is interesting because they had a unique view of the world that is shown through their work.
These bots do not have perception filters. They’re designed to break down whatever they’re trained on into numbers and decipher how the style is constructed so it can replicate it. It has no intention or purpose behind any of its decisions beyond straight replication.
You would be correct if a human’s only goal was to replicate Van Gogh’s style but that’s not every artist. With these art bots, that’s the only goal that they will ever have.

I have to repeat this every time there’s a discussion on LLM or art bots:
The imitation of intelligence does not equate to actual intelligence.

frog 🐸 · 2 years ago

Absolutely agreed! I think if the proponents of AI artwork actually had any knowledge of art history, they’d understand that humans don’t just iterate the same ideas over and over again. Van Gogh, Picasso, and many others, did work that was genuinely unique and not just a derivative of what had come before, because they brought more to the process than just looking at other artworks.

Storksforlegs · 2 years ago

deleted by creator