• Aatube
    link
    fedilink
    -271 year ago

    robots.txt is purely textual; you can’t run JavaScript or log anything. Plus, one who doesn’t intend to follow robots.txt wouldn’t query it.

    • @BrianTheeBiscuiteer@lemmy.world
      link
      fedilink
      English
      441 year ago

      If it doesn’t get queried that’s the fault of the webscraper. You don’t need JS built into the robots.txt file either. Just add some line like:

      here-there-be-dragons.html
      

      Any client that hits that page (and maybe doesn’t pass a captcha check) gets banned. Or even better, they get a long stream of nonsense.

    • @ShitpostCentral@lemmy.world
      link
      fedilink
      English
      111 year ago

      You’re second point is a good one, but you absolutely can log the IP which requested robots.txt. That’s just a standard part of any http server ever, no JavaScript needed.

    • @ricecake@sh.itjust.works
      link
      fedilink
      English
      91 year ago

      People not intending to follow it is the real reason not to bother, but it’s trivial to track who downloaded the file and then hit something they were asked not to.

      Like, 10 minutes work to do right. You don’t need js to do it at all.