Huawei Researchers Develop LLM With 1.085 Trillion Parameters

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 2 years ago

Huawei Researchers Develop LLM With 1.085 Trillion Parameters

k_o_t@lemmy.ml · 2 years ago

recent advancements in LLMs that are small enough to be accessible to regular ppl (alpaca/llama), but also performant enough to sometimes even outperform chatGPT, are more interesting to me personally

while the size of this model is certainly super impressive, even if the weights were released, it would require like half a terabyte of VRAM at int4 MINIMUM, so you’d need like ~100k usd just to run inference on this thing at decent accuracy :(

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 2 years ago

I think the solution there would be to run things in a distributed fashion the way petals is doing https://github.com/bigscience-workshop/petals

k_o_t@lemmy.ml · edit-2 2 years ago

phil_m@lemmy.ml · 2 years ago

Although I would find it really impressive, I don’t think we are anywhere near compressing that much information in so few parameters. Just think about all the specialized knowledge ChatGPT4 has. I think almost the whole internet or so is somehow (better or worse) encoded in the model.

As the title already reveals the trend is the opposite (to be able to encode/reason about even more data with bigger models/datasets).

But yeah it’s kind of concerning that all this power is in so few hands/companies.

k_o_t@lemmy.ml · 2 years ago

certainly more weights contain more general information, which is pretty useful if you’re using a model is a sort of secondary search engine, but models can be very performant in certain benchmarks while containing little general data

this isn’t really by design, up until now (and it’s still continuing to be that way), it’s just that we don’t know how to create an LLM, which can generate coherent text without absorbing a huge portion of the training material

i’ve tried several models based on facebook’s llama LLMs, and i can say that the 13B and definitely 30B versions are comparable to chatGPT in terms of quality (maybe not in terms of the amount of information it has access to, but definitely in other regards)