• @myliltoehurts@lemm.ee
    link
    fedilink
    English
    1181 year ago

    So they filled reddit with bot generated content, and now they’re selling back the same stuff likely to the company who generated most of it.

    At what point can we call an AI inbred?

    • @restingboredface@sh.itjust.works
      link
      fedilink
      English
      161 year ago

      I wonder if Open AI or any of the other firms have thought to put in any kind of stipulations about monitoring and moderating reddit content to reduce ai generated posts and reduce risk of model collapse.

      Anybody who’s looked at reddit in the past 2 years especially has seen the impact of ai pretty clearly. If I was running open ai I wouldn’t want that crap contaminating my models.

        • metaStatic
          link
          fedilink
          141 year ago

          yep they fuckin got us

          but it’s not like our posts are safe here either. This is the world we live in now.

        • @db2@lemmy.world
          link
          fedilink
          English
          101 year ago

          They’re not multiple though, edit it and then delete it and it’s gone. They disabled all the tools to do it though so it’s manually or nothing now.

          • @SchmidtGenetics@lemmy.world
            link
            fedilink
            English
            71 year ago

            They just reload a previous cached comment, doesn’t matter how many times you edit or delete, it’s all logged and backed up.

        • @Imgonnatrythis@sh.itjust.works
          link
          fedilink
          English
          41 year ago

          Will be interesting to see if they stoop so low as to allow this. Probably wouldn’t be a super wise move as most deleted posts are likely material that would not be great to train on anyway. My first thought when I read this was, “well, not on MY posts” I’m clean off of reddit.

      • @bobs_monkey@lemm.ee
        link
        fedilink
        English
        11
        edit-2
        1 year ago

        I used redact.dev to mass edit all my comments, worked pretty well. Problem is that if you mass delete, they’ll restore them pretty quick, but so far they haven’t reverted my edits.

      • @Rolando@lemmy.world
        link
        fedilink
        English
        21 year ago

        Back when I deleted all my comments, I was told I could claim to be in Europe and make a request citing the European law that Reddit has to follow. I think Reddit had a page where you could make the request, but of course it was hard to find.

    • @micka190@lemmy.world
      link
      fedilink
      English
      61 year ago

      Realistically, when you’re operating at Reddit’s scale, you’re probably keeping a history of each comment for analytics purposes.

  • @AlexWIWA@lemmy.ml
    link
    fedilink
    English
    311 year ago

    LLMs have been training on Reddit posts since at least 2012. Nothing really new here.

  • @filister@lemmy.world
    link
    fedilink
    English
    25
    edit-2
    1 year ago

    What makes you think that they are not scraping Lemmy too? The only reason they might not be is probably how niche Lemmy and the fediverse are, but I am sure there have been people already doing it.

    • Dr. Moose
      link
      fedilink
      English
      241 year ago

      Fediverse is designed to do exactly that. It’s free flow of information which is a good thing. Don’t let corporations hijack this beautiful concept. We all want information to be free.

    • @olympicyes@lemmy.world
      link
      fedilink
      English
      121 year ago

      I’m not mad about the scraping. The linkedin scraping case pretty much cemented that there was nothing that could be done to stop it. I’m just mad that I can no longer use the app of my choice. No such problem with Lemmy.

    • @AlexWIWA@lemmy.ml
      link
      fedilink
      English
      41 year ago

      Lemmy is even easier to scrape. Just set up your own instance, then read the database after activity pub pushes everything to you.

    • @kia@lemmy.ca
      link
      fedilink
      English
      31 year ago

      I’m sure they are, but Reddit probably provides these companies with lots of personalized metadata they collect just for them which they may not get from Lemmy.

  • @Dark_Dragon@lemmy.dbzer0.com
    link
    fedilink
    English
    18
    edit-2
    1 year ago

    Reddit banned me through IP address or something. Whatever new account i create will be banned within 24hrs even if i don’t upvote a single post or comment. I tried with 10 new account all banned and all new email address. So gave up and randomly changed all my good comments. Shifted permanently to lemmy. Missing some of the most niche community. But not so much to return to reddit.

    Edit: I didn’t even commit any rule violation. Took a too long to change from modded reddit app. I only logged in once. That doesn’t amount to blocking me from every using reddit.

    • @dumblederp@lemmy.world
      link
      fedilink
      English
      21 year ago

      If you use a vpn and a disposable email you can get about a week out of an account if you need to comment, it’ll get quietly shadowbanned though.

  • Dr. Moose
    link
    fedilink
    English
    171 year ago

    This form of propaganda is my pet peeve. It’s not “your posts” as soon as you put something to public you don’t get to eat your cake. It’s out there, you shared it. Don’t share it if you don’t want humanity to ingest and use it.

    • Dataprolet
      link
      fedilink
      English
      161 year ago

      You’re technically right, but nobody anticipated and therefore agreed on their posts being used for training LLMs.

    • @Azzu@lemm.ee
      link
      fedilink
      English
      4
      edit-2
      1 year ago

      It’s not about it being used to train AI. It’s about the AI either not being open source/I don’t get access to it (i.e. not benefitting me) or reddit being paid for my comments (i e. also not benefitting me).

      If this AI training would get me or the public access to the AI, or I would be paid for my comments instead of Reddit, I’d be fine with it.

      • Dr. Moose
        link
        fedilink
        English
        5
        edit-2
        1 year ago

        yeah but you don’t get to choose that. You give away that right as soon as you participate in public discourse. It’s a zero sum game - either it’s a public for everyone or no one.

        Don’t get me wrong, Reddit is a bitch but I think people want to cut their noses off to spite their faces here. It’s much more important to have free information flow than to fuck reddit.

        My fear is that people will vote in some really dumb rules to spite AI and restrict free information flow accidentally.

        • @Azzu@lemm.ee
          link
          fedilink
          English
          1
          edit-2
          1 year ago

          That’s how it is currently and maybe also your opinion. But that doesn’t mean it has to be like that in a society. It’s your opinion that everything public can go private at any time (training proprietary private AI), but we can decide as a society that’s not how we want to do things. We can require stuff that used public data to be public as well.

          And yeah I kinda get to choose that. As democratic society, anything that the public (i.e. including me) decides, goes. Of course, if there are people like you that don’t want stuff trained on public data to be required to be public, democracy will also work in the sense that we don’t get that, as it is currently.

  • @Kyrgizion@lemmy.world
    link
    fedilink
    English
    51 year ago

    I didn’t delete my comments before nuking my account, but I’m pretty sure the grand majority were shitposts containing ample amounts of smut, gore and other ridiculous over the top shit. So I consider this a win.

  • @db2@lemmy.world
    link
    fedilink
    English
    31 year ago

    Not my posts. Go ahead, look at what remains. The rest was edited and then deleted.

    Fuck you, Steve. Right in the ass.

    • yeehaw
      link
      fedilink
      English
      71 year ago

      If only snapshots and backups were a thing…

      • @CeeBee@lemmy.world
        link
        fedilink
        English
        51 year ago

        It’s theoretically possible, but the issue that anyone trying to do that would run into is consistency.

        How do you restore the snapshots of a database to recover deleted comments but also preserve other comments newer than the snapshot date?

        The answer is that it’s nearly impossible. Not impossible, but not worth the massive monumental effort when you can just focus on existing comments which greatly outweigh any deleted ones.

        • yeehaw
          link
          fedilink
          English
          11 year ago

          It’s a piece of cake. Some code along the lines of:

          If ($user.modifyCommentRecentlyCount > 50){

          Print “user is nuking comments” $comment = $previousComment }

          Or some shit. It can be done quite easily, trust me.

          • @CeeBee@lemmy.world
            link
            fedilink
            English
            11 year ago

            It can be done quite easily, trust me.

            The words of every junior dev right before I have to spend a weekend undoing their crap.

            I’ve been there too many times.

            There are always edge cases you need to account for, and you can’t account for them until you run tests and then verify the results.

            And you’d be parsing billions upon billions of records. Not a trivial thing to do when running multiple tests to verify. And ultimately for what is a trivial payoff.

            You don’t screw around with infinitely invaluable prod data of your business without exhausting every single possibility of data modification.

            It’s a piece of cake.

            It hurts how often I’ve heard this and how often it’s followed by a massive screw up.

            • yeehaw
              link
              fedilink
              English
              11 year ago

              The words of every junior dev right before I have to spend a weekend undoing their crap.

              There are so many ways this can be done that I think you are not thinking of. Say a user goes to “shreddit” (or some other similar app) their comments. They likely have thousands. On every comment edit, it’s quite easy to check the last time the users edited one of their comments. All they need is some check like checking if the last 10 consecutive comments were edited in hours or milliseconds/seconds. After that, reddit could easily just tell the user it’s editing their comments but it’s not. Like a shadowban kind of method. Another way would be at the data structure level. We don’t know what their databases and hardware are like, but I can speculate. What if each user edited comment is not an update query on a database, but an add/insert. Then all you need to do is update the live comments where the date is before the malicious date where the username=$username. Not to mention when you start talking Nimble storage and stuff like that, the storage is extremely quick to respond. Hell I would wager it didn’t even hit storage yet, probably still on some all flash cache or in memory. Another way could be at the filesystem level. Ever heard of zfs? What if each user had their own dataset or something, it’s extremely easy and quick to roll back a snapshot, or to clone the previous snapshot. There are so many ways.

              At the end of the day a user is triggering this action, so we don’t necessarily need to parse “billions” of records. Just the records for a single user.

              • @CeeBee@lemmy.world
                link
                fedilink
                English
                1
                edit-2
                1 year ago

                There are so many ways this can be done that I think you are not thinking of.

                No, I can think of countless ways to do this. I do this kind of thing every single day.

                What I’m saying is that you need to account for every possibility. You need to isolate all the deleted comments that fit the criteria of the “Reddit Exodus”.

                How do you do that? Do you narrow it down to a timeframe?

                The easiest way to do this is identify all deleted accounts, find the backup with the most recent version of their profile with non-deleted comments, and insert that user back into the main database (not the prod db).

                Now you need to parse billions upon billions upon billions of records. And yes, it’s billions because you need the system to search through all the records to know which record fits the parameters. And you need to do that across multiple backups for each deleted profile/comment.

                It’s a lot of work. And what’s the payoff? A few good comments and a ton of “yes this ^” comments.

                I sincerely doubt it’s worth the effort.

                Edit: formatting

                • yeehaw
                  link
                  fedilink
                  English
                  11 year ago

                  How do you do that? Do you narrow it down to a timeframe?

                  When a user edits a comment, they submit a response. When they submit a response, they trigger an action. An action can do validation steps and call methods, just like I said above, for example. When the edit action is triggered, check the timestamp against the previously edited comment’s timestamp. If the previous - or previous 5 are less than a given timeframe, flag it. “Shadowban” the user. Make it look like they’ve updated their comments to them, but in reality they’re the same.

                  We’ve had detection methods for this sort of thing for a long time. Thing about how spam filtering works. If you’re using some tool to scramble your data, they likely have patterns. To think reddit doesn’t have some means to protect itself against this is naive. It’s their whole business. All these user submitted comments are worth money.

                  Now you need to parse billions upon billions upon billions of records. And yes, it’s billions because you need the system to search through all the records to know which record fits the parameters. And you need to do that across multiple backups for each deleted profile/comment.

                  This makes me thing you don’t understand my meaning. I think you’re talking about one day reddit decides to search for an restore obfuscated and deleted comments. Yes, that would be a large undertaking. This is not what I’m suggesting at all. Stop it while it’s happening, not later. Patterns and trends can easily identify when a user is doing something like shreddit or the like, then the code can act on it.

                  It’s a lot of work. And what’s the payoff? A few good comments and a ton of “yes this ^” comments.

                  this

      • @Todgerdickinson@lemmy.world
        link
        fedilink
        English
        21 year ago

        Yea that’s the problem isn’t it. I had a great idea involving bullshit-efying my comments by editing them slowly with a LLM via long running script and repeatedly over months.

        I realised that they probably don’t delete the original text on edit anyway which, as you say is probably buried in a backup someplace.

  • @villainy@lemmy.world
    link
    fedilink
    English
    21 year ago

    “Strikes” made me think they were cancelling the deal. Like strike-through, crossed it out, etc. Too bad.