PR Announcement: https://medium.com/yandex/yandex-publishes-yalm-100b-its-the-largest-gpt-like-neural-network-in-open-source-d1df53d0e9a6
Github: https://github.com/yandex/YaLM-100B
Network is trained using same principles as Megatron LM, inference alone will require 4 A100s
> u/gwern
>
> No information on how they targeted the size/compute, or a loss curve, so that probably means they undertrained it, because no one can resist the temptation to claim a bigger parameter-count than they actually have the compute-budget for, and it won't outperform models you'd expect it to. (And I don't mean Chinchilla, I mean OPT and GPT-J-20b.)
Yes, a couple years ago.