Yandex Publishes YaLM 100B. It’s the Largest GPT-Like Neural Network in Open Source
Network is trained using same principles as Megatron LM, inference alone will require 4 A100s


No information on how they targeted the size/compute, or a loss curve, so that probably means they undertrained it, because no one can resist the temptation to claim a bigger parameter-count than they actually have the compute-budget for, and it won’t outperform models you’d expect it to. (And I don’t mean Chinchilla, I mean OPT and GPT-J-20b.)


Wonder why they chose Apache license as opposed to GPL or better yet, AGPL, would have benefitted them and everyone else because then Google or something can’t just take it all and use it in their service without giving anything back.

@AgreeableLandscape @OptimusPrime Well im guessing yandex wants to use it in closed source applications aswell. It’s definitely better for it to be apache license than to be proprietary

can someone ELI5 what this is and how this will benefit developers and researchers(if it will) ?

Training models is very expensive due to the resources required and the need to hire qualified experts to work on it, which is why only large IT companies can afford regular access to this technology. If researchers can’t get access to these models then growth of research could wane, so Yandex is open sourcing their model which is now the largest ever one that is open sourced.

makes me sad that russia has been stifled by corruption and mismanagement for the last 3 decades, with further and further diminishing budgets for education and ever increasing military budgets, makes me think what cool things we could have done…

