Reddit has a new AI training deal to sell user content::Reddit has reportedly made a deal with an unnamed AI company to allow access to its platform’s content for the purposes of AI model training.

  • ME5SENGER_24
    link
    fedilink
    English
    111 year ago

    FUCK REDDIT! FUCK U/SPEZ! The Red-exit shall endure, VIVA LA LEMMY!!

    • FartsWithAnAccent
      link
      fedilink
      English
      31 year ago

      I’d be surprised if there wasn’t, I don’t think Spez and his cohorts are competent enough to completely suppress all information about it site wide.

  • kingthrillgore
    link
    fedilink
    English
    8
    edit-2
    1 year ago

    When spez took away API access, he basically shit on the social contract that offered a fair exchange of free access for the content we fed into reddit. After the API change, there were new terms: there is no contract. There are no terms. If you use reddit now, you are giving away everything you are to be indexed and mangled by statistics. You exist as free labor to statisticians and machines.

    You are more than a few cents of bad memes.

    I’m going to make the request in the AM that Lemmy should add robots.txt rules to disallow AI crawlers, to at least indicate we’re not interested. We need legislation that tells scrapers what they can access.

    • @General_Effort@lemmy.world
      link
      fedilink
      English
      11 year ago

      We need legislation that tells scrapers what they can access.

      What do you hope that would achieve?

      Because I can only see this as benefitting Reddit, Facebook, and the like, while screwing over smaller players.

    • @Crack0n7uesday@lemmy.world
      link
      fedilink
      English
      10
      edit-2
      1 year ago

      They can and do, but they want the training models to come from highly moderated sources otherwise every AI chatbot would be spewing the most racist parts of 4chan because people would train it that way as a joke.

      If you let AI roam freely across the internet, it would only learn porn, sailor moon, dragon Ball z, and nazi germany.

    • @Steak@lemmy.ca
      link
      fedilink
      English
      31 year ago

      Dick dick pussy cunt cock dick pussy ass shit cunt shit motherfucker shit motherfucker ass tits cunt cock motherfucker shit ass tits motherfucker shit c’mon. Scrape that🔥

  • Lvxferre [he/him]
    link
    fedilink
    English
    6
    edit-2
    1 year ago

    For anyone looking for a gibberish generator to replace their Reddit content with, here’s one. This shit is like poison for those large models.

    For automatic edition I’m not sure on what people can use nowadays; back then just before the APIcalypse I’ve used power delete suite, I’m not sure if it still works and I’m not creating a Reddit account just to test it out.

    • @greaprr@sh.itjust.works
      link
      fedilink
      English
      11 year ago

      Not that I’m against telling Reddit to fuck off in no uncertain terms, but won’t providing this kind of poisoning to AI training just make it more resilient to exactly this kind of thing?

      • Lvxferre [he/him]
        link
        fedilink
        English
        1
        edit-2
        1 year ago

        I don’t think so. It’s really hard to sort the poison out of the data, unless you actually have enough reading comprehension to know that it’s gibberish - humans do, bots don’t. And even if they discard 80% of the poison, the 20% there are already screwing with the model.

        They could prevent you from editing your posts/comments, but that would cause an uproar.

  • @General_Effort@lemmy.world
    link
    fedilink
    English
    11 year ago

    They say it’s $60 million on an annualized basis. I wonder who’d pay that, given that you can probably scrape it for free.

    Maybe it’s the AI act in the EU. That might cause trouble in that regard. The US is seeing a lot of rent-seeker PR, too, of course. That might cause some to hedge their bets.

    Maybe some people had not realized that yet, but limiting fair use does not just benefit the traditional media corporations but also the likes of Reddit, Facebook, Apple, etc. Making “robots.txt” legally binding would only benefit the tech companies.