- cross-posted to:
- privacy@lemmy.ml
- cross-posted to:
- privacy@lemmy.ml
Admittedly, I know little of AI. However, once companies can no longer increase profit with AI, they will use it to save costs instead. This will inevitably lead to mass layoffs, not because AI will correctly determine where to maximize revenue, but because executives don’t understand how how AI works, and they don’t understand how their employees contribute to their revenue.
It’ll also do the maximizing revenue sort of layoffs, which are also a really bad thing in a society where basic necessities are tied to employment. The execs will also fuck up a bunch in humorous ways, but that’s nothing more than a comforting distraction from the real and present danger automation of this level presents to a society built around employment.
The article doesn’t explain how that’s the case at all.
Aren’t all the big AI models trained on publicly available data?
Books3 is the definition of “not publicly available” because it’s all from pirated material downloaded from private torrent tracker Bibliotik.
Books3 is literally why several of AI groups are being sued by various authors like Sarah Silverman and George R.R. Martin.
Books3 was always illicitly obtained material which put into question whether an LLM using it could really fall under Fair Use. (It most likely does, but it’s still a legal question that hasn’t been answered yet.)
Books3 Link: https://huggingface.co/datasets/the_pile_books3
Books3 Description from Link:
This dataset is Shawn Presser’s work and is part of EleutherAi/The Pile dataset.
This dataset contains all of bibliotik in plain .txt form, aka 197,000 books processed in exactly the same way as did for bookcorpusopen (a.k.a. books1). seems to be similar to OpenAI’s mysterious “books2” dataset referenced in their papers. Unfortunately OpenAI will not give details, so we know very little about any differences. People suspect it’s “all of libgen”, but it’s purely conjecture.