ChatGPT is full of sensitive private information and spits out verbatim text from CNN, Goodreads, WordPress blogs, fandom wikis, Terms of Service agreements, Stack Overflow source code, Wikipedia pages, news blogs, random internet comments, and much more.

  • @Usernameblankface@lemmy.world
    link
    fedilink
    English
    32 years ago

    I wonder if this kind of cut/paste happens with image generators. Do they sometimes output an entire image from their training data? Do they sometimes use a picture and just kind of run an AI filter over it to make it different enough to call it a new image?

    • brianorca
      link
      fedilink
      English
      82 years ago

      Diffusion AI (most image AI) works differently than an LLM. They actually start with noise, and adjust it iteratively to satisfy the prompt. So they don’t tend to reproduce entire images unless they are overtrained (i.e. the same image was trained a thousand times instead of once) or the prompt is overly specific. (i.e you ask for “The Mona Lisa by Leonardo”)

      But words don’t work well with diffusion, since dog and God are very different meanings despite using the same letters. So an LLM spits out a specific sequence of word tokens.