@ConsciousCode

@ConsciousCode@beehaw.org · 1 year ago

It’s important to be aware of their opinions because they quickly become policy and rhetoric while Dems do damage control and fail to fix the underlying issues Reps exploit. In this case, having an instance where they directly contradict their own “sleepy Joe” narrative can help deconstruct the beliefs of family members who have fallen for it.

@ConsciousCode@beehaw.org · edit-2 2 years ago

This is a sane and measured response to a terrorist attack /s Just do terrorism back 100-fold, I guess?

@ConsciousCode@beehaw.org · 2 years ago

Network effects. The “we” you’re referring to could only be like 100 million at most, the vast majority of people don’t have the technical know-how to switch, or to articulate exactly why they feel miserable every time they log in for their daily fix.

@ConsciousCode@beehaw.org · 2 years ago

There’s a lot of papers which propose adding new tokens to elicit some behavior or another, though I haven’t seen them catch on for some reason. A new token would mean adding a new trainable static vector which would initially be something nonsensical, and you would want to retrain it on a comparably sized corpus. This is a bit speculative, but I think the introduction of a token totally orthogonal to the original (something like eg smell, which has no textual analog) would require compressing some of the dimensions to make room for that subspace, otherwise it would have a form of synesthesia, relating that token to the original neighboring subspaces. If it was just a new token still within the original domain though, you could get a good enough initial approximation by a linear combination of existing token embeddings - eg a monkey with a hat emoji comes out, you add tokens for monkey emoji + hat emoji, then finetune it.

Most extreme option, you could increase the embedding dimensionality so the original subspaces are unaffected and the new tokens can take up those new dimensions. This is extreme because it means resizing every matrix in the model, which even for smaller models would be many thousands of parameters, and the performance would tank until it got a lot more retraining.

(deleted original because I got token embeddings and the embedding dimensions mixed up, essentially assuming a new token would use the “extreme option”).

@ConsciousCode@beehaw.org · edit-2 2 years ago

There’s a lot of papers which propose adding new tokens to elicit some behavior or another, though I haven’t seen them catch on for some reason. A new token would mean adding a new trainable static vector which would initially be something nonsensical, and you would want to retrain it on a comparably sized corpus. This is a bit speculative, but I think the introduction of a token totally orthogonal to the original (something like eg smell, which has no textual analog) would require compressing some of the dimensions to make room for that subspace, otherwise it would have a form of synesthesia, relating that token to the original neighboring subspaces. If it was just a new token still within the original domain though, you could get a good enough initial approximation by a linear combination of existing token embeddings - eg a monkey with a hat emoji comes out, you add tokens for monkey emoji + hat emoji, then finetune it.

Most extreme option, you could increase the embedding dimensionality so the original subspaces are unaffected and the new tokens can take up those new dimensions. This is extreme because it means resizing every matrix in the model, which even for smaller models would be many thousands of parameters, and the performance would tank until it got a lot more retraining.

@ConsciousCode@beehaw.org · 2 years ago

LLMs are not expert systems, unless you characterize them as expert systems in language which is fair enough. My point is that they’re applicable to a wide variety of tasks which makes them general intelligences, as opposed to an expert system which by definition can only do a handful of tasks.

If you wanted to use an LLM as an expert system (I guess in the sense of an “expert” in that task, rather than a system which literally can’t do anything else), I would say they currently struggle with that. Bare foundation models don’t seem to have the sort of self-awareness or metacognitive capabilities that would be required to restrain them to their given task, and arguably never will because they necessarily can only “think” on one “level”, which is the predicted text. To get that sort of ability you need cognitive architectures, of which chatbot implementations like ChatGPT are a very simple version of. If you want to learn more about what I mean, the most promising idea I’ve seen is the ACE framework. Frameworks like this can allow the system to automatically look up an obscure disease based on the embedded distance to a particular query, so even if you give it a disease which only appears in the literature after its training cut-off date, it knows this disease exists (and is a likely candidate) by virtue of it appearing in its prompt. Something like “You are an expert in diseases yadda yadda. The symptoms of the patient are x y z. This reminds you of these diseases: X (symptoms 1), Y (symptoms 2), etc. What is your diagnosis?” Then you could feed the answer of this question to a critical prompting, and repeat until it reports no issues with the diagnosis. You can even make it “learn” by using LoRA, or keep notes it writes to itself.

As for poorer data distributions, the magic of large language models (before which we just had “language models”) is that we’ve found that the larger we make them, and the more (high quality) data we feed them, the more intelligent and general they become. For instance, training them on multiple languages other than English somehow allows them to make more robust generalizations even just within English. There are a few papers I can recall which talk about a “phase transition” which happens during training where beforehand, the model seems to be literally memorizing its corpus, and afterwards (to anthropomorphize a bit) it suddenly “gets” it and that memorization is compressed into generalized understanding. This is why LLMs are applicable to more than just what they’ve been taught - you can eg give them rules to follow within the conversation which they’ve never seen before, and they are able to maintain that higher-order abstraction because of that rich generalization. This is also a major reason open source models, particularly quantizations and distillations, are so successful; the models they’re based on did the hard work of extracting higher-order semantic/geometric relations, and now making the model smaller has minimal impact on performance.

@ConsciousCode@beehaw.org · 2 years ago

LLMs are not chatbots, they’re models. ChatGPT/Claude/Bard are chatbots which use LLMs as part of their implementation. I would argue in favor of the article because, while they aren’t particularly intelligent, they are general-purpose and exhibit some level of intelligence and thus qualify as “general intelligence”. Compare this to the opposite, an expert system like a chess computer. You can’t even begin to ask a chess computer to explain what a SQL statement does, the question doesn’t even make sense. But LLMs are capable of being applied to virtually any task which can be transcribed. Even if they aren’t particularly good, compared to GPT-2 which read more like a markov chain they at least attempt to complete the task, and are often correct.

@ConsciousCode@beehaw.org · 2 years ago

Actually a really interesting article which makes me rethink my position somewhat. I guess I’ve unintentionally been promoting LLMs as AGI since GPT-3.5 - the problem is just with our definitions and how loose they are. People hear “AGI” and assume it would look and act like an AI in a movie, but if we break down the phrase, what is general intelligence if not applicability to most domains?

This very moment I’m working on a library for creating “semantic functions”, which lets you easily use an LLM almost like a semantic processor. You say await infer(f"List the names in this text: {text}") and it just does it. What most of the hype has ignored with LLMs is that they are not chatbots. They are causal autoregressive models of the joint probabilities of how language evolves over time, which is to say they can be used to build chatbots, but that’s the first and least interesting application.

So yeah, I guess it’s been AGI this whole time and I just didn’t realize it because they aren’t people, and I had assumed AGI implied personhood (which it doesn’t).

@ConsciousCode@beehaw.org · 2 years ago

It feels kind of hopeless now that we’d ever get something that feels so “radical”, but I’d like to remind people that 80+ hour work weeks without overtime used to be the norm before unions got us the 40 hour work week. It feels inevitable and hopeless until the moment we get that breakthrough, then it becomes the new norm.

@ConsciousCode@beehaw.org · 2 years ago

Good to note that this isn’t even hypothetical, it literally happened with cable. First it was ad-funded, then you paid to get rid of ads, then you paid exorbitant prices to get fed ads, and the final evolution was being required to pay $100+ for bundles including channels you’d never use to get at the one you would. It’s already happening to streaming services too, which have started to bundle.

@ConsciousCode@beehaw.org · 2 years ago

Bobby: “Caring is for suckers” Peggy: “Bobby is TOO YOUNG to know that, Hank!”

I’m dying omfg.

Hank is the purest boy, we don’t deserve him

@ConsciousCode@beehaw.org · 2 years ago

I wonder what effects this will have with all these antitrust suits happening right as AI is ramping up, but before any of them have got any real foothold. Maybe Alexa will never get a brain and instead AI assistants will be seeded by the breakups or startups untarnished by the end stages of their shareholders parasitizing value?

@ConsciousCode@beehaw.org · 2 years ago

Nebula is so cheap I have a subscription even though I almost never use it. I would use it more if they had a better recommendation system, as it is now you almost have to search for a specific video you want or dig through piles of random videos you don’t care about.

@ConsciousCode@beehaw.org · 2 years ago

Also worth considering with self-report that Gen Z may just be more open about their failings

@ConsciousCode@beehaw.org · 2 years ago

😮 Where’s the link? I don’t use Discord but I like free /s

@ConsciousCode@beehaw.org · 2 years ago

So how long until people start advocating for the nonhuman, nonsentient robot over actual PoC? “Silicon lives matter”?

@ConsciousCode@beehaw.org · 2 years ago

Can’t be a billionaire if you pass a certain threshold of self-awareness, it’s the rules.

@ConsciousCode@beehaw.org · 2 years ago

Hm, yeah I think you’re right. I was wondering why it wasn’t sitting right in my head. Deflation encourages hoarding because the value of each unit keeps increasing so if you spend now instead of later you lose some amount of potential value. I don’t think it was meant to be a scam though. In this case I’d consider it ignorance of the knock-on effects later exploited rather than an explicit conspiracy from the get-go.

@ConsciousCode@beehaw.org · 2 years ago

Bitcoin at least is inherently deflationary because there’s a fixed market cap of 21 million bitcoins. Once all of those are mined, all value from then on is some fraction of a fraction of one of those, thus they decrease in value over time. I should also note, I like Bitcoin as a proof of concept but don’t think it’s viable as a currency, and PoW isn’t viable as a consensus protocol (although it demonstrated that such consensus protocols are possible).

@ConsciousCode@beehaw.org · edit-2 2 years ago

It’s fiat, I won’t argue it was ever going to be a good currency with built-in deflation, but that’s what it was originally meant to be. It’s long since become too volatile to be anything but a speculative asset, though. It does seem curious to me what that says about the actual distinction between legitimate currencies, stock options, and pyramid scheme buy-ins.