• 1 Post
  • 75 Comments
Joined 5 months ago
cake
Cake day: August 27th, 2025

help-circle
  • Ah; as I recall, it’s because they polled users and there was an overwhelming “yes please”, based on Proton’s privacy stance.

    Given proton is hosted in EU, they’re likely quite serious about GDPR and zero data retention.

    Lumo is interesting. Architecturally I mean, as a LLM enjoyer. I played around with it a bit, and stole a few ideas from them when I jury rigged my system. Having said that, you could get a ton more with $10 on OpenRouter. Hell, the free models on there are better than lumo and you can choose to only use privacy respecting providers.


  • Thank you for saying that and for noticing it! Seeing you were kind enough to say that, I’d like to say a few things about how/why I made this stupid thing. It might be of interest to people. Or not LOL.

    To begin with, when I say I’m not a coder, I really mean it. It’s not false modesty. I taught myself this much over the course of a year and the reactivation of some very old skills (30 years hence). When I decided to do this, it wasn’t from any school of thought or design principle. I don’t know how CS professionals build things. The last time I looked at an IDE was Turbo Pascal. (Yes, I’m that many years old. I think it probably shows, what with the >> ?? !! ## all over the place. I stopped IT-ing when Pascal, Amiga and BBS were still the hot new things)

    What I do know is - what was the problem I was trying to solve?

    IF the following are true;

    1. I have ASD. If you tell me a thing, I assume your telling me a thing. I don’t assume you’re telling me one thing but mean something else.
    2. A LLM could “lie” to me, and I would believe it, because I’m not a subject matter expert on the thing (usually). Also see point 1.
    3. I want to believe it, because why would a tool say X but mean Y? See point 1.
    4. A LLM could lie to me in a way that is undetectable, because I have no idea what it’s reasoning over, how it’s reasoning over it. It’s literally a black box. I ask a Question—>MAGIC WIRES---->Answer.

    AND

    1. “The first principle is that you must not fool yourself and you are the easiest person to fool”

    THEN

    STOP.

    I’m fucked. This problem is unsolvable.

    Assuming LLMs are inherently hallucinatory within bounds (AFAIK, the current iterations all are), if there’s even a 1% chance that it will fuck me over (it has), then for my own sanity, I have to assume that such an outcome is a mathematical certainty. I cannot operate in this environment.

    PROBLEM: How do I interact with a system that is dangerously mimetic and dangerously opaque? What levers can I pull? Or do I just need to walk away?

    1. Unchangeable. Eat shit, BobbyLLM. Ok.
    2. I can do something about that…or at least, I can verify what’s being said, if the process isn’t too mentally taxing. Hmm. How?
    3. Fine, I want to believe it…but, do I have to believe it blindly? How about a defensive position - “Trust but verify”?. Hmm. How?
    4. Why does it HAVE to be opaque? If I build it, why do I have to hide the workings? I want to know how it works, breaks, and what it can do.

    Everything else flowed from those ideas. I actually came up with a design document (list of invariants). It’s about 1200 words or so, and unashamedly inspired by Asimov :)

    MoA / Llama-swap System

    System Invariants


    0. What an invariant is (binding)

    An invariant is a rule that:

    • Must always hold, regardless of refactor, feature, or model choice
    • Must not be violated temporarily, even internally. The system must not fuck me over silently.
    • Overrides convenience, performance, and cleverness.

    If a feature conflicts with an invariant, the feature is wrong. Do not add.


    1. Global system invariant rules:

    1.1 Determinism over cleverness

    • Given the same inputs and state, the system must behave predictably.

    • No component may:

      • infer hidden intent,
      • rely on emergent LLM behavior
      • or silently adapt across turns without explicit user action.

    1.2 Explicit beats implicit

    • Any influence on an answer must be inspectable and user-controllable.

    • This includes:

      • memory,
      • retrieval,
      • reasoning mode,
      • style transformation.

    If something affects the output, the user must be able to:

    • enable it,
    • disable it,
    • and see that it ran.

    Assume system is going to lie. Make its lies loud and obvious.


    On and on it drones LOL. I spent a good 4-5 months just revising a tighter and tighter series of constraints, so that 1) it would be less likely to break 2) if it did break, it do in a loud, obvious way.

    What you see on the repo is the best I could do, with what I had.

    I hope it’s something and I didn’t GIGO myself into stupid. But no promises :)



  • Agree-ish

    Hallucination is inherent to unconstrained generative models: if you ask them to fill gaps, they will. I don’t know how to “solve” that at the model level.

    What you can do is make “I don’t know” an enforced output, via constraints outside the model.

    My claim isn’t “LLMs won’t hallucinate.” It’s “the system won’t silently propagate hallucinations.” Grounding + refusal + provenance live outside the LLM, so the failure mode becomes “no supported answer” instead of “confident, slick lies.”

    So yeah: generation will always be fuzzy. Workflow-level determinism doesn’t have to be.

    I tried yelling, shouting, and even percussive maintenance but the stochastic parrot still insisted “gottle of geer” was the correct response.


  • Cheers!

    Re: OpenAI API format: 3.6 - not great, not terrible :)

    In practice I only had to implement a thin subset: POST /v1/chat/completions + GET /v1/models (most UIs just need those). The payload is basically {model, messages, temperature, stream…} and you return a choices[] with an assistant message. The annoying bits are the edge cases: streaming/SSE if you want it, matching the error shapes UIs expect, and being consistent about model IDs so clients don’t scream “model not found”. Which is actually a bug I still need to squash some more for OWUI 0.7.2. It likes to have its little conniptions.

    But TL;DR: more plumbing than rocket science. The real pain was sitting down with pen and paper and drawing what went where and what wasn’t allowed to do what. Because I knew I’d eventually fuck something up (I did, many times), I needed a thing that told me “no, that’s not what this is designed to do. Do not pass go. Do not collect $200”.

    shrug I tried.








  • Yeah.

    The SHA isn’t there to make the model smarter. It’s there to make the source immutable and auditable.

    Having been burnt by LLMs (far too many times), I now start from a position of “fuck you, prove it”.

    The hash proves which bytes the answer was grounded in, should I ever want to check it. If the model misreads or misinterprets, you can point to the source and say “the mistake is here, not in my memory of what the source was.”.

    If it does that more than twice, straight in the bin. I have zero chill any more.

    Secondly, drift detection. If someone edits or swaps a file later, the hash changes. That means yesterday’s answer can’t silently pretend it came from today’s document. I doubt my kids are going to sneak in and change the historical prices of 8 bit computers (well, the big one might…she’s dead keen on being a hacker) but I wanted to be sure no one and no-thing was fucking with me.

    Finally, you (or someone else) can re-run the same question against the same hashed inputs and see if the system behaves the same way.

    So: the hashes don’t fix hallucinations (I don’t even think that’s possible, even with magic). The hashes make it possible to audit the answer and spot why hallucinations might have happened.

    PS: You’re right that interpretation errors still exist. That’s why Mentats does the triple-pass and why the system clearly flags “missing / unsupported” instead of filling gaps. The SHA is there to make the pipeline inspectable, instead of “trust me, bro.”.

    Guess what? I don’t trust you. Prove it or GTFO.




  • re: the KB tool, why not just skip the llm and do two chained fuzzy finds? (what knowledge base & question keywords)

    Yep, good question. You can do that, it’s not wrong. If your KB is small + your question is basically “find me the paragraph that contains X,” then yeah: two-pass fuzzy find will dunk on any LLM for speed and correctness.

    But the reason I put an LLM in the loop is: retrieval isn’t the hard part. Synthesis + constraint are. What a LLM is doing in KB mode (basically) is this -

    1. Turns question into extraction task. Instead of “search keywords,” it’s: “given these snippets, answer only what is directly supported, and list what’s missing.”

    2. Then, rather that giving 6 fragments across multiple files, the LLM assembles the whole thing into a single answer, while staying source locked (and refusing fragments that don’t contain the needed fact).

    3. Finally: it has “structured refusal” baked in. IOW, the whole point is that the LLM is forced to say “here are the facts I saw, and this is what I can’t answer from those facts”.

    TL;DR: fuzzy search gets you where the info lives. This gets you what you can safely claim from it, plus an explicit “missing list”.

    For pure retreval: yeah - search. In fact, maybe I should bake in a >>grep or >>find commands. That would be the right trick for “show me the passage” not “answer the question”.

    I hope that makes sense?


  • Oh shit! Uh…thank you! Umm. Yes. That was unexpected :)

    Re: collab. I’m away for a bit with work, but let me think on it for a bit? There’s got to be a way to make this useful to more peeps.

    Believe it or not, I am not a CS guy at ALL (I work in health-care) and I made this for fun, in a cave, with a box of scraps.

    I’m not good at CS. I just have a … “very special” brain. As in, I designed this thing from first principles using invariants, which I understand now is not typical CS practice.


  • Fuck, I’m an all seeing, prescient wizard! Neat!

    Uh, I mean… “yes of course, I intended to have Ai Horde piping baked in from the get go”.

    Do let me know if my shit works with their shit to make…not shit.

    #####################

    EDIT: For fun, I asked my router to ask Qwen3-4B 2507 WTF Ai Horde was. I’m showing this for interest sake, because 1) shows you how the thing works IRL 2) shows you how and when it’s WRONG in a LOUD way, if you know how to read it.

    ######################

    This is in default mode, no KBs attached. Just straight YOLOing the question at it, cold, and asking it to argue from it’s own internal priors.

    AI-horde is a decentralized platform where users rent or sell computational power (GPU time) to run AI models. It connects AI model developers who need processing with individuals or organizations who have idle GPU hardware. The platform uses blockchain-like tokens to manage payments and incentives. It’s not a single AI system but a marketplace for GPU resources.

    Missing details: Specific token mechanics, current model library, or user onboarding flow could affect accuracy. For example, if token economics changed recently, the explanation might not reflect current conditions.

    Confidence: Medium | Source: Model

    #####################################

    That’s the routers way of saying “hey, best guess. Sorry”. And from what I can see, it sort of right, but specifics are wrong. So, ok for a cheap YOLO, not for “I bet my money on this” Let’s call this a 4/10

    ####################################

    Now, I gave it web access privileges. It did this. Let’s call this 7/10

    An AI-horde is a free, open-source community service that allows anyone to contribute computing power to generate AI-generated images and text. It functions as a crowdsourced distributed cluster of workers focused on image and text generation tasks [1]. This model resembles collaborative projects like Folding@home, where users share computational resources for a common goal [2].

    ####################################

    Finally, here’s what it does in >>KB mode. I scraped the FAQ and then did >>summ new. Eyeballing it against the FAQ, it’s call this 8.5/10. YMMV

    AI-horde is a decentralized platform for distributed AI computing. It allows users to rent out their idle GPU hardware to run AI models, connecting developers needing processing power with volunteers who provide it through open APIs. The system uses a non-cryptocurrency-based incentive called Kudos to prioritize contributions, where workers earn Kudos for completing tasks, and these are not tradable assets but rather internal metrics of service quality and project loyalty.

    Access to AI-horde is managed via API keys; anonymous usage is allowed but receives the lowest priority. The project emphasizes volunteerism and discourages commercialization of Kudos, treating its sale or exchange for money or cryptocurrency as contrary to its ethos. Participation as a worker involves installing the AI-horde client, selecting task types (e.g., image or text generation), configuring supported models, and running the worker to begin earning Kudos.

    Confidence: high | Source: Docs

    ################################

    FINALLY, for “fun”, I >>kb attached all and asked it to reason over the above three responses, in context of what it has in the KB store, and rank them out of 10.

    [3] is most correct, scoring 9.5/10. It accurately reflects the core model: decentralized GPU renting via open APIs, Kudos as a non-crypto, non-tradable incentive tied to volunteerism, and the crowdsourced nature of contributions. It also correctly notes access via API keys and lower priority for anonymous use.

    [2] is accurate but less complete, scoring 7.5/10. It correctly identifies the crowdsourced, open-source, volunteer-based nature and analogy to Folding@home. However, it omits key specifics like Kudos, API key access, and the explicit non-commercialization stance.

    [1] is partially correct but misleading, scoring 6/10. It correctly describes GPU

    Confidence: medium | Source: Docs


    TL;DR

    • KB mode: source-based info and when queried against, adjudication accuracy = v.good
    • Clean, reproducible demo. Try it yourself

  • Responding to my own top post like a FB boomer: May I make one request?

    If you found this little curio interesting at all, please share in the places you go.

    And especially, if you’re on Reddit, where normies go.

    I use to post heavily on there, but then Reddit did a reddit and I’m done with it.

    https://lemmy.world/post/41398418/21528414

    Much as I love Lemmy and HN, they’re not exactly normcore, and I’d like to put this into the hands of people :)

    PS: I am think of taking some of the questions you all asked me here (de-identified) and writing a “Q&A_with_drBobbyLLM.md” and sticking it on the repo. It might explain some common concerns.

    And, If nothing else, it might be mildly amusing.



  • Yes of course.

    Abliterated is a technical LLM term meaning “safety refusals removed”.

    Basically, abliteration removes the security theatre that gets baked into LLM like chatGPT.

    I don’t like my tools deciding for me what I can and cannot do with them.

    I decide.

    Anyway, the model I use has been modified with a newer, less lobotomy inducing version of abliteration (which previously was a risk).

    https://huggingface.co/DavidAU/Qwen3-4B-Hivemind-Instruct-NEO-MAX-Imatrix-GGUF/tree/main

    According to validation I’ve seen online (and of course, I tested it myself), it’s lost next to zero “IQ” and dropped refusals by about…90%.

    BEFORE: Initial refusals: 99/100

    AFTER: Refusals: 8/100 [lower is better], KL divergence: 0.02 (less than 1 is great, “0” is perfect.)

    In fact, in some domains it’s actually a touch smarter, because it doesn’t try to give you “perfect” model answers. Maths reasoning for example, where the answer is basically impossible, it will say “the answer is impossible. Here’s the nearest workable solution based on context” instead of getting stuck in a self-reinforcing loop, trying to please you, and then crashing.

    In theory, that means you could ask it for directions on how to cook Meth and it would tell you.

    I’m fairly certain the devs didn’t add the instructions for that in there, but if they did, the LLM won’t “sorry, I can’t tell you, Dave”.

    Bonus: with my harness over the top, you’d have an even better idea if it was full of shit (it probably would be, because, again, I’m pretty sure they don’t train LLM on Breaking Bad).

    Extra double bonus: If you fed it exact instructions for cooking meth, using the methods I outlined? It will tell you exactly how to cook Meth, 100% of the time.

    Say…you…uh…wanna cook some meth? :P

    PS: if you’re more of a visual learner, this might be a better explanation

    https://www.youtube.com/watch?v=gr5nl3P4nyM