1-bit LLMs Could Solve AI’s Energy Demands

floofloof@lemmy.ca · 1 year ago

1-bit LLMs Could Solve AI’s Energy Demands

BigFig@lemmy.world · edit-2 1 year ago

Know what uses less? No LLMs

sugar_in_your_tea@sh.itjust.works · 1 year ago

Yay, I’m doing my part!

Naz@sh.itjust.works · 1 year ago

Try using a 1-bit LLM to test the article’s claim.

The perplexity loss is staggering. It’s like 75% accuracy lost or more. It turns a 30 billion parameter model into a 7 billion parameter model.

Highly recommended that you try to replicate their results.

Hupf@feddit.de · 1 year ago

https://xkcd.com/2934/

davidgro@lemmy.world · 1 year ago

But since it takes 10% of the space (vram, etc.) sounds like they could just start with a larger model and still come out ahead

kromem@lemmy.world · 1 year ago

There’s actually a perplexity improvement parameter-to-paramater for BitNet-1.58 which increases as it scales up.

So yes, post-training quantization perplexity issues are apparent, but if you train quantization in from the start it is better than FP.

Which makes sense through the lens of the superposition hypothesis where the weights are actually representing a hyperdimensional virtual vector space. If the weights have too much precision competing features might compromise on fuzzier representations instead of restructuring the virtual network to better matching nodes.

Constrained weight precision is probably going to be the future of pretraining within a generation or two looking at the data so far.

1 year ago

We invented multi bit models so we could get more accuracy since neural networks are based off human brains which are 1 bit models themselves. A 2 bit neuron is 4 times as capable as a 1 bit neuron but only double the size and power requirements. This whole thing sounds like bs to me. But then again maybe complexity is more efficient than per unit capability since thats the tradeoff.

Wappen@lemmy.world · 1 year ago

Human brains aren’t 1 bit models. Far from it actually, I am not an expert though but I know that neurons in the brain encode different signal strengths in their firing frequency.

1 year ago

Firing of on and off.

conciselyverbose@sh.itjust.works · 1 year ago

We really don’t know jack shit, but we know more than enough to know fire rate is hugely important.

kromem@lemmy.world · edit-2 1 year ago

The network architecture seems to create a virtualized hyperdimensional network on top of the actual network nodes, so the node precision really doesn’t matter much as long as quantization occurs in pretraining.

If it’s post-training, it’s degrading the precision of the already encoded network, which is sometimes acceptable but always lossy. But being done at the pretrained layer it actually seems to be a net improvement over higher precision weights even if you throw efficiency concerns out the window.

You can see this in the perplexity graphs in the BitNet-1.58 paper.

lunar17@lemmy.world · 1 year ago

None of those words are in the bible

kromem@lemmy.world · edit-2 1 year ago

No, but some alarmingly similar ideas are in the heretical stuff actually.

buzz86us@lemmy.world · 1 year ago

We need to scale fusion

Frost-752@lemmy.world · 1 year ago

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10441807/

potatopotato@sh.itjust.works · 1 year ago

Making ai more efficient will just mean more ai

Knock_Knock_Lemmy_In@lemmy.world · 1 year ago

Smaller and speedier means larger token windows and greater variety of models.

Not less energy.

morrowind@lemmy.ml · 1 year ago

deleted by creator