In search for a new self-hosted LLM

Tanka@lemmy.ml · edit-2 1 month ago

In search for a new self-hosted LLM

Jozzo@lemmy.world · 1 month ago

I find Qwen3.5 is the best at toolcalling and agent use, otherwise Gemma4 is a very solid all-rounder and it should be the first you try. Tbh gpt-oss is still good to this day, are you running into any problems w it?

Tanka@lemmy.ml · 1 month ago

No problems per se. I just thought that I had not checked for an update for a longer time.

jacksilver@lemmy.world · 29 days ago

You’re probably aware, but updating the model periodically is probably a good idea just because things do change overtime.

A model from two years ago was trained on data from at least two years ago. Meaning any technology, code, world event changes wouldn’t be reflected in the model.

iceberg314@slrpnk.net · 30 days ago

I also recommend gemma4 or qwen3.5. Both super solid in my experience for how lightweight they are

NoFun4You@lemmy.world · 29 days ago

Still can’t get my gemma to give me complete unbuggy components

iceberg314@slrpnk.net · 27 days ago

I guess I have been using gemma4 fro more role playing games. Qwen3.5 seems to be better coder actually

Gumus@lemmy.dbzer0.com · 1 month ago

I’d say Qwen 3.5 and Gemma 4 beat GPT OSS in every aspect.

zorflieg@lemmy.world · 29 days ago

Gemma4 e4b quant8 will fit in 12gb and is good

carzian@lemmy.ml · 1 month ago

I’m in the same boat. You’ll get better responses if you post your machine specs. I

Evotech@lemmy.world · 29 days ago

I’d use some Chinese model. Qwen3.5 Claude 4.6 distilled ablitirated is what I use

theunknownmuncher@lemmy.world · 1 month ago

How much VRAM?

SuspciousCarrot78@lemmy.world · edit-2 8 days ago

deleted by creator

Matt@lemmy.ml · 29 days ago

Qwen is pretty good. Also try LFM models.

jaschen306@sh.itjust.works · 1 month ago

I’m running gemma4 26b MOE for most of my agent calls. I use glm5:cloud for my development agent because 26b struggles when the context windows gets too big.

nutbutter@discuss.tchncs.de · 30 days ago

Have you tried the new gemma4 models? The e4b fits in the 12gb memory and is pretty good. Or you can use 31b too, if you’re okay with offloading to CPU.