• @fartsparkles@lemmy.world
    link
    fedilink
    1161 month ago

    If this passes, piracy websites can rebrand as AI training material websites and we can all run a crappy model locally to train on pirated material.

      • Pennomi
        link
        fedilink
        English
        241 month ago

        It’s only theft if they support laws preventing their competitors from doing it too. Which is kind of what OpenAI did, and now they’re walking that idea back because they’re losing again.

      • @masterspace@lemmy.ca
        link
        fedilink
        English
        19
        edit-2
        1 month ago

        No it’s not.

        It can be problematic behaviour, you can make it illegal if you want, but at a fundamental level, making a copy of something is not the same thing as stealing something.

        • @pyre@lemmy.world
          link
          fedilink
          7
          edit-2
          1 month ago

          it uses the result of your labor without compensation. it’s not theft of the copyrighted material. it’s theft of the payment.

          it’s different from piracy in that piracy doesn’t equate to lost sales. someone who pirates a song or game probably does so because they wouldn’t buy it otherwise. either they can’t afford or they don’t find it worth doing so. so if they couldn’t pirate it, they still wouldn’t buy it.

          but this is a company using labor without paying you, something that they otherwise definitely have to do. he literally says it would be over if they couldn’t get this data. they just don’t want to pay for it.

          • @masterspace@lemmy.ca
            link
            fedilink
            English
            -2
            edit-2
            1 month ago

            That information is published freely online.

            Do companies have to avoid hiring people who read and were influenced by copyrighted material?

            I can regurgitate copyrighted works as well, and when someone hires me, places like Stackoverflow get fewer views to the pages that I’ve already read and trained on.

            Are companies committing theft by letting me read the internet to develop my intelligence? Are they committing theft when they hire me so they don’t have to do as much research themselves? Are they committing theft when they hire thousands of engineers who have read and trained on copyrighted material to build up internal knowledge bases?

            What’s actually happening, is that the debates around AI are exposing a deeply and fundamentally flawed copyright system. It should not be based on scarcity and restriction but rewarding use. Information has always been able to flow freely, the mistake was linking payment to restricting it’s movement.

            • @pyre@lemmy.world
              link
              fedilink
              01 month ago

              it’s ok if you don’t know how copyright works. also maybe look into plagiarism. there’s a difference between relaying information you’ve learned and stealing work.

              • @Grimy@lemmy.world
                link
                fedilink
                41 month ago

                Training on publicly available material is currently legal. It is how your search engine was built and it is considered fair use mostly due to its transformative nature. Google went to court about it and won.

                • @pyre@lemmy.world
                  link
                  fedilink
                  -31 month ago

                  can you point to the trial they won? I only know about a case that was dismissed.

                  because what we’ve seen from ai so far is hardly transformative.

        • @kibiz0r@midwest.social
          link
          fedilink
          English
          121 month ago

          Also true. It’s scraping.

          In the words of Cory Doctorow:

          Web-scraping is good, actually.

          Scraping against the wishes of the scraped is good, actually.

          Scraping when the scrapee suffers as a result of your scraping is good, actually.

          Scraping to train machine-learning models is good, actually.

          Scraping to violate the public’s privacy is bad, actually.

          Scraping to alienate creative workers’ labor is bad, actually.

          We absolutely can have the benefits of scraping without letting AI companies destroy our jobs and our privacy. We just have to stop letting them define the debate.

          • Grumuk
            link
            fedilink
            English
            41 month ago

            Molly White also wrote about this in the context of open access on the web and people being concerned about how their works are being used.

            “Wait, not like that”: Free and open access in the age of generative AI

            The same thing happened again with the explosion of generative AI companies training models on CC-licensed works, and some were disappointed to see the group take the stance that, not only do CC licenses not prohibit AI training wholesale, AI training should be considered non-infringing by default from a copyright perspective.

          • @FauxLiving@lemmy.world
            link
            fedilink
            01 month ago

            Our privacy was long gone well before AI companies were even founded, if people cared about their privacy then none of the largest tech companies would exist because they all spy on you wholesale.

            The ship has sailed on generating digital assets. This isn’t a technology that can be invented. Digital artists will have to adapt.

            Technology often disrupts jobs, you can’t fix that by fighting the technology. It’s already invented. You fight the disruption by ensuring that your country takes care of people who lose their jobs by providing them with support and resources to adapt to the new job landscape.

            For example, we didn’t stop electronic computers to save the job of Computer (a large field of highly trained humans who did calculations) and CAD destroyed the drafting profession. Digital artists are not the first to experience this and they won’t be the last.

            • @masterspace@lemmy.ca
              link
              fedilink
              English
              31 month ago

              Our privacy was long gone well before AI companies were even founded, if people cared about their privacy then none of the largest tech companies would exist because they all spy on you wholesale.

              In the US. The EU has proven that you can have perfectly functional privacy laws.

              If your reasoning is based o the US not regulating their companies and so that makes it impossible to regulate them, then your reasoning is bad.

              • @FauxLiving@lemmy.world
                link
                fedilink
                5
                edit-2
                1 month ago

                My reasoning is based upon observing the current Internet from the perspective of working in cyber security and dealing with privacy issues for global clients.

                The GDPR is a step in the right direction, but it doesn’t guarantee your digital privacy. It’s more of a framework to regulate the trading and collecting of your personal data, not to prevent it.

                No matter who or where you are, your data is collected and collated into profiles which are traded between data brokers. Anonymized data is a myth, it’s easily deanonymized by data brokers and data retention limits do essentially nothing.

                AI didn’t steal your privacy. Advertisers and other data consuming entities have structured the entire digital and consumer electronics ecosystem to spy on you decades before transformers or even deep networks were ever used.

          • @Grimy@lemmy.world
            link
            fedilink
            -11 month ago

            Creators who are justifiably furious over the way their bosses want to use AI are allowing themselves to be tricked by this argument. They’ve been duped into taking up arms against scraping and training, rather than unfair labor practices.

            That’s a great article. Isn’t this kind of exactly what is going on here? Wouldn’t bolstering copyright laws make training unaffordable for everyone except a handful of companies. Then these companies, because of their monopoly, could easily make the highest level models only affordable by the owner class.

            People are mad at AI because it will be used to exploit them instead of the ones who exploit them every chance they get. Even worse, the legislation they shout for will make that exploitation even easier.

  • Phoenixz
    link
    fedilink
    391 month ago

    This is a tough one

    Open-ai is full of shit and should die but then again, so should copyright law as it currently is

    • @meathappening@lemmy.ml
      link
      fedilink
      English
      28
      edit-2
      1 month ago

      That’s fair, but OpenAI isn’t fighting to reform copyright law for everyone. OpenAI wants you to be subject to the same restrictions you currently face, and them to be exempt. This isn’t really an “enemy of my enemy” situation.

    • PropaGandalf
      link
      fedilink
      111 month ago

      yes, screw them both. let altman scrape all the copyright material and choke on it

    • @turnip@sh.itjust.works
      link
      fedilink
      English
      101 month ago

      Sam Altman hasn’t complained surprisingly, he just said there’s competition and it will be harder for OpenAI to compete with open source. I think their small lead is essentially gone, and their plan is now to suckle Microsoft’s teet.

      • @HiddenLayer555@lemmy.ml
        link
        fedilink
        English
        81 month ago

        it will be harder for OpenAI to compete with open source

        Can we revoke the word open from their name? Please?

  • @Rekorse@sh.itjust.works
    link
    fedilink
    121 month ago

    Getting really tired of these fucking CEOs calling their failing businesses “threats to national security” so big daddy government will come and float them again. Doubly ironic its coming from a company whos actually destroying the fucking planet while it achieves fuck-all.

      • @droplet6585@lemmy.ml
        link
        fedilink
        English
        81 month ago

        They monetize it, erase authorship and bastardize the work.

        Like if copyright was to protect against anything, it would be this.

  • Obligatory: I’m anti-AI, mostly anti-technology

    That said, I can’t say that I mind LLMs using copyrighted materials that it accesses legally/appropriately (lots of copyrighted content may be freely available to some extent, like news articles or song lyrics)

    I’m open to arguments correcting me. I’d prefer to have another reason to be against this technology, not arguing on the side of frauds like Sam Altman. Here’s my take:

    All content created by humans follows consumption of other content. If I read lots of Vonnegut, I should be able to churn out prose that roughly (or precisely) includes his idiosyncrasies as a writer. We read more than one author; we read dozens or hundreds over our lifetimes. Likewise musicians, film directors, etc etc.

    If an LLM consumes the same copyrighted content and learns how to copy its various characteristics, how is it meaningfully different from me doing it and becoming a successful writer?

    • Pennomi
      link
      fedilink
      English
      81 month ago

      Right. The problem is not the fact it consumes the information, the problem is if the user uses it to violate copyright. It’s just a tool after all.

      Like, I’m capable of violating copyright in infinitely many ways, but I usually don’t.

      • @SoulWager@lemmy.ml
        link
        fedilink
        4
        edit-2
        1 month ago

        The problem is that the user usually can’t tell if the AI output is infringing someone’s copyright or not unless they’ve seen all the training data.

    • @ricecake@sh.itjust.works
      link
      fedilink
      71 month ago

      Yup. Violating IP licenses is a great reason to prevent it. According to current law, if they get Alice license for the book they should be able to use it how they want.
      I’m not permitted to pirate a book just because I only intend to read it and then give it back. AI shouldn’t be able to either if people can’t.

      Beyond that, we need to accept that might need to come up with new rules for new technology. There’s a lot of people, notably artists, who object to art they put on their website being used for training. Under current law if you make it publicly available, people can download it and use it on their computer as long as they don’t distribute it. That current law allows something we don’t want doesn’t mean we need to find a way to interpret current law as not allowing it, it just means we need new laws that say “fair use for people is not the same as fair use for AI training”.

    • @kibiz0r@midwest.social
      link
      fedilink
      English
      71 month ago

      If an LLM consumes the same copyrighted content and learns how to copy its various characteristics, how is it meaningfully different from me doing it and becoming a successful writer?

      That is the trillion-dollar question, isn’t it?

      I’ve got two thoughts to frame the question, but I won’t give an answer.

      1. Laws are just social constructs, to help people get along with each other. They’re not supposed to be grand universal moral frameworks, or coherent/consistent philosophies. They’re always full of contradictions. So… does it even matter if it’s “meaningfully” different or not, if it’s socially useful to treat it as different (or not)?
      2. We’ve seen with digital locks, gig work, algorithmic market manipulation, and playing either side of Section 230 when convenient… that the ethos of big tech is pretty much “define what’s illegal, so I can colonize the precise border of illegality, to a fractal level of granularity”. I’m not super stoked to come with an objective quantitative framework for them to follow, cuz I know they’ll just flow around it like water and continue to find ways to do antisocial shit in ways that technically follow the rules.
    • @droplet6585@lemmy.ml
      link
      fedilink
      English
      5
      edit-2
      1 month ago

      and learns how to copy its various characteristics

      Because you are a human. Not an immortal corporation.

      I am tired of people trying to have iNtElLeCtUaL dIsCuSsIoN about/with entities that would feed you feet first into a wood chipper if it thought it could profit from it.

      • @Bassman1805@lemmy.world
        link
        fedilink
        71 month ago

        You can sue for anything in the USA. But it is pretty much impossible to successfully sue for “ripping off someone’s style”. Where do you even begin to define a writing style?

        • @catloaf@lemm.ee
          link
          fedilink
          English
          01 month ago

          There are lots of ways to characterize writing style. Go read Finnegans Wake and tell me James Joyce doesn’t have a characteristic style.

      • @MrQuallzin@lemmy.world
        link
        fedilink
        0
        edit-2
        1 month ago

        Edited for clarity: If that were the case then Weird AL would be screwed.

        Original: In that case Weird AL would be screwed

  • @SomeAmateur@sh.itjust.works
    link
    fedilink
    English
    21 month ago

    I think it would be interesting as hell if they had to cite where the data was from on request. See if it’s legitimate sources or just what a reddit user said five years ago

  • @NewOldGuard@lemmy.ml
    link
    fedilink
    English
    2
    edit-2
    1 month ago

    Oh no not the plagiarism machine however would we recover???

    Please fail and die openai thx

    Also copyright is bullshit and IP shouldn’t exist especially for corporate entities. Free sharing of human knowledge and creativity should be a right. Machine plagiarism to create uninspired mimicries isn’t a necessary part of that process and should be regulated heavily