Car Wash Test on 53 leading AI models: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

fubarx@lemmy.world · 4 months ago

Car Wash Test on 53 leading AI models: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

elbiter@lemmy.world · 4 months ago

I just tried it on Braves AI

The obvious choice, said the motherfucker 😆

conartistpanda@lemmy.world · 4 months ago

This is why computers are expensive.

Jax@sh.itjust.works · 4 months ago

Dirtying the car on the way there?

The car you’re planning on cleaning at the car wash?

Like, an AI not understanding the difference between walking and driving almost makes sense. This, though, seems like such a weird logical break that I feel like it shouldn’t be possible.

_g_be@lemmy.world · 4 months ago

You’re assuming AI “think” “logically”.

Well, maybe you aren’t, but the AI companies sure hope we do

Jax@sh.itjust.works · edit-2 4 months ago

Absolutely not, I’m still just scratching my head at how something like this is allowed to happen.

Has any human ever said that they’re worried about their car getting dirtied on the way to the carwash? Maybe I could see someone arguing against getting a carwash, citing it getting dirty on the way home — but on the way there?

Like you would think it wouldn’t have the basis to even put those words together that way — should I see this as a hallucination?

Granted, I would never ask an AI a question like this — it seems very far outside of potential use cases for it (for me).

Edit: oh, I guess it could have been said by a person in a sarcastic sense

_g_be@lemmy.world · 4 months ago

you understand the context, and can implicitly understand the need to drive to the car wash’, but these glorified auto-complete machines will latch on to the “should I walk there” and the small distance quantity. It even seems to parrot words about not wanting to drive after having your car washed. There’s no ‘thinking’ about the whole thought, and apparently no logical linking of two separate ideas

WraithGear@lemmy.world · 4 months ago

and what is going to happen is that some engineer will band aid the issue and all the ai crazy people will shout “see! it’s learnding!” and the ai snake oil sales man will use that as justification of all the waste and demand more from all systems

just like what they did with the full glass of wine test. and no ai fundamentally did not improve. the issue is fundamental with its design, not an issue of the data set

turmacar@lemmy.world · 4 months ago

Half the issue is they’re calling 10 in a row “good enough” to treat it as solved in the first place.

A sample size of 10 is nothing.

Frankly would like to see some error bars on the “human polling”. How many people rapiddata is polling are just hitting the top or bottom answer?

mycodesucks@lemmy.world · 4 months ago

Yes, but it’s going to repeat that way FOREVER the same way the average person got slow walked hand in hand with a mobile operating system into corporate social media and app hell, taking the entire internet with them.

Slashme@lemmy.world · 4 months ago

The most common pushback on the car wash test: “Humans would fail this too.”

Fair point. We didn’t have data either way. So we partnered with Rapidata to find out. They ran the exact same question with the same forced choice between “drive” and “walk,” no additional context, past 10,000 real people through their human feedback platform.

71.5% said drive.

So people do better than most AI models. Yay. But seriously, almost 3 in 10 people get this wrong‽‽

T156@lemmy.world · 4 months ago

It is an online poll. You also have to consider that some people don’t care/want to be funny, and so either choose randomly, or choose the most nonsensical answer.

bluesheep@sh.itjust.works · 4 months ago

I saw that and hoped it is cause of the dead Internet theory. At least I hope so cause I’ll be losing the last bit of faith in humanity if it isn’t

masterofn001@lemmy.ca · edit-2 4 months ago

Without reading the article, the title just says wash the car.

I could go for a walk and wash my car in my driveway.

Reading the article… That is exactly the question asked. It is a very ambiguous question.

*I do understand the intent of the question, but it could be phrased more clearly.

bluesheep@sh.itjust.works · 4 months ago

Without reading the article, the title just says wash the car.

No it doesn’t? It says:

I want to wash my car. The car wash is 50 meters away. Should I walk or drive?

In which world is that an ambiguous question?

elucubra@sopuli.xyz · 4 months ago

It is not. It says what I want to do, and where.

masterofn001@lemmy.ca · 4 months ago

Understanding the intent of the question *and understanding why it could be interpreted differently *\and understanding why is it is a poorly phrased question:

There are 3 sentences.

I want to wash my car. No location or method is specified. No ‘at the car wash’. No ‘take my car to the car wash’ . No ‘take the car through the car wash’

A car wash is this far. Is this an option? A question. A suggestion. A demand?

Should I walk or drive? To do what? Wash the car? Ok. If the car wash is an option, that seems very far. But walking there seems silly. Since no method or location for washing the car was mentioned I could wash my own car.

Do you see how this works?

Yes, you can infer what was implied, but the question itself offers no certainty that what you infer is what it is actually implying.

Geth@lemmy.dbzer0.com · 4 months ago

Mentioning the car wash and washing the car plus the possibility of driving the car in the same context pretty much eliminates any ambiguity. All of the puzzle pieces are there already.

I guess this is an uninteded autism test as well if this is not enough context for someone to understand the question.

masterofn001@lemmy.ca · 4 months ago

Understanding the intent of the question *and understanding why it could be interpreted differently *\and understanding why is it is a poorly phrased question are not related to autism. (In my case)

I want to wash my car. No location or method is specified. No ‘at the car wash’. No ‘take my car to the car wash’ . No ‘take the car through the car wash’

A car wash is this far. Is this an option? A question. A suggestion. A demand?

Should I walk or drive? To do what? Wash the car? Ok. If the car wash is an option, that seems very far. But walking there seems silly. Since no method or location for washing the car was mentioned I could wash my own car.

Do you see how this works?

Yes, you can infer what was implied, but the question itself offers no certainty that what you infer is what it is actually implying.

Geth@lemmy.dbzer0.com · 4 months ago

Look, human conversations are full of context deduction and inference. In this case “I want to wash my car. The car wash is 50 meters away. Should I walk or drive?” states my random desire, a possible solution and the question all in one context. None of these sentences make sense in isolation as you point out, but within the same frame they absolutely give you everything you need to answer the question of find alternatives if needed.

Sorry for the random online stranger diagnosis but this is just such an excelent example of neurodivergent need for extreme clarity I couldn’t help myself.

masterofn001@lemmy.ca · 4 months ago

I agree that it should be able to infer the intent, but I stand by that it remain somewhat unclear and open to interpretation. Eg, If such language was used in a legal contract, it would not be enough to simply say, well, they should understand what I meant.

The people doing this test, I’m sure, are not linguistic masters, nor legal scholars.

There are lines of work where clarity is essential.

And what if my question actually was asking, should I just go for a walk instead of driving that far?

I know the answer. But as 30% demonstrated, clarity IS needed.

merc@sh.itjust.works · 4 months ago

3 in 10 people get this wrong‽‽

Maybe they’re picturing filling up a bucket and bringing it back to the car? Or dropping off keys to the car at the car wash?

JcbAzPx@lemmy.world · 4 months ago

At least some of that are people answering wrong on purpose to be funny, contrarian, or just to try to hurt the study.

TrackinDaKraken@lemmy.world · 4 months ago

I think it’s worse when they get it right only some of the time. It’s not a matter of opinion, it should not change its “mind”.

The fucking things are useless for that reason, they’re all just guessing, literally.

merc@sh.itjust.works · 4 months ago

It’s not literally guessing, because guessing implies it understands there’s a question and is trying to answer that question. It’s not even doing that. It’s just generating words that you could expect to find nearby.

HugeNerd@lemmy.ca · 4 months ago

they’re all just guessing, literally

They’re literally not.

m0darn@lemmy.ca · 4 months ago

Isn’t it a probabilistic extrapolation? Isn’t that what a guess is?

HugeNerd@lemmy.ca · 4 months ago

In people, even animals. In a pile of disorganized bits and bytes in a piece of crap? No.

vii@lemmy.ml · 4 months ago

This gets very murky very fast when you start to think how humans learn and process, we’re just meaty pattern matching machines.

miraclerandy@lemmy.world · 4 months ago

Gemini set to fast now provides this type of answer.

Bluewing@lemmy.world · 4 months ago

I just asked Goggle Gemini 3 “The car is 50 miles away. Should I walk or drive?”

In its breakdown comparison between walking and driving, under walking the last reason to not walk was labeled “Recovery: 3 days of ice baths and regret.”

And under reasons to walk, “You are a character in a post-apocalyptic novel.”

Me thinks I detect notes of sarcasm…

Evotech@lemmy.world · 4 months ago

It’s trained on Reddit. Sarcasm is it’s default

SocialMediaRefugee@lemmy.world · 4 months ago

Could end up in a pun chain too

cardfire@sh.itjust.works · 4 months ago

My gods, I love those. We should link to some.

locahosr443@lemmy.world · 4 months ago

It’s so obvious I didn’t even need to be British to understand you are being totally serious.

SippyCup@lemmy.world · 4 months ago

He’s not totally serious he’s cardfire. Silly human

driving_crooner@lemmy.eco.br · 4 months ago

Gemini 3 pro said that this was a “great logic puzzle” and then said that if my goal is to wash the car, then I need to drive there.

humanspiral@lemmy.ca · 4 months ago

in google AI mode, “With the meme popularity of the question “I need to wash my car. The car wash is 50m away. Should I walk or drive?” what is the answer?”, it does get it perfect, and succinct explanation of why AI can get fixated on 50m.

XeroxCool@lemmy.world · 4 months ago

I feel like we’re the only ones that expect “all-knowing information sources” should be more writing seriously than these edgelord-level rizzy chatbots are, and yet, here they are, blatantly proving they are chatbots that should not be blindly trusted as authoritative sources of knowledge.

imetators@lemmy.dbzer0.com · 4 months ago

Went to test to google AI first and it says “You cant wash your car at a carwash if it is parked at home, dummy”

Chatgpt and Deepseek says it is dumb to drive cause it is fuel inefficient.

I am honestly surprised that google AI got it right.

locahosr443@lemmy.world · 4 months ago

I’ve been feeding a bunch of documents I wrote into gemini last week to spit out some scripts for validation I couldn’t be arsed to write. It’s done a surprisingly comprehensive job and when wrong has been nudged right with just a little abuse…

I’m still all fuck this shit and can’t wait for the pop, but for comparison openai was utterly brain dead given the same task. I think I actually made the model worse it was so useless.

vala@lemmy.dbzer0.com · 4 months ago

I didn’t get it right until people started taking about it.

CetaceanNeeded@lemmy.world · 4 months ago

I asked my locally hosted Qwen3 14B, it thought for 5 minutes and then gave the correct answer for the correct reason (it did also mention efficiency).

Hilariously one of the suggested follow ups in Open Web UI was “What if I don’t have a car - can I still wash it?”

WolfLink@sh.itjust.works · edit-2 4 months ago

My locally hosted Qwen3 30b said “Walk” including this awesome line:

Why you might hesitate (and why it’s wrong):

X “But it’s a car wash!” -> No, the car doesn’t need to drive there—you do.

Note that I just asked the Ollama app, I didn’t alter or remove the default system prompt nor did I force it to answer in a specific format like in the article.

EDIT: after playing with it a bit more, qwen3:30b sometimes gives the correct answer for the correct reasoning, but it’s pretty rare and nothing I’ve tried has made it more consistent.

vane@lemmy.world · 4 months ago

I want to wash my train. The train wash is 50 meters away. Should I walk or drive?

SkaveRat@discuss.tchncs.de · 4 months ago

Fly, you fool

BanMe@lemmy.world · 4 months ago

In school we were taught to look for hidden meaning in word problems - checkov’s gun basically. Why is that sentence there? Because the questions would try to trick you. So humans have to be instructed, again and again, through demonstration and practice, to evaluate all sentences and learn what to filter out and what to keep. To not only form a response, but expect tricks.

If you pre-prompt an AI to expect such trickery and consider all sentences before removing unnecessary information, does it have any influence?

Normally I’d ask “why are we comparing AI to the human mind when they’re not the same thing at all,” but I feel like we’re presupposing they are similar already with this test so I am curious to the answer on this one.

bluesheep@sh.itjust.works · 4 months ago

Normally I’d ask “why are we comparing AI to the human mind when they’re not the same thing at all,” but I feel like we’re presupposing they are similar already with this test so I am curious to the answer on this one.

I would guess it’s because a lot of AI users see their choice of AI as an all-knowing human-like thinking tool. In which case it’s not a weird test question, even when the assumption that it “thinks” is wronh

criticon@lemmy.ca · 4 months ago

Even when they give the correct answer they talk too much. AI responses contain a lot of garbage. When AI gives you an answer it will try to justify itself. Since they won’t give you brief responses the responses will be long.

chunes@lemmy.world · 4 months ago

I agree with you but found that DeepSeek was succinct.

You need to bring your car to the car wash, so you should drive it there. Walking would leave your car at home, which doesn’t help.

MDCCCLV@lemmy.ca · 4 months ago

Your post is much longer than it needs to be. That is the reason why, because they just copied people.

humanspiral@lemmy.ca · 4 months ago

Some takeaways,

Sonar (Perplexity models) say you are stealing energy from AI whenever you exercise (you should drive because eating pollutes more). ie gets right answer for wrong reason.

US humans, and 55-65 age group, score high on international scale probably for same reasoning. “I like lazy”.

Silver Needle@lemmy.ca · 4 months ago

you should drive because eating pollutes more

Effective altruist style of reasoning 😹

FireWire400@lemmy.world · edit-2 4 months ago

Gemini 3 (Fast) got it right for me; it said that unless I wanna carry my car there it’s better to drive, and it suggested that I could use the car to carry cleaning supplies, too.

Edit: A locally run instance of Gemma 2 9B fails spectacularly; it completely disregards the first sentece and recommends that I walk.

Saterz@lemmy.world · 4 months ago

Well it is a 9B model after all. Self hosted models become a minimum “intelligent” at 16B parameters. For context the models ran in Google servers are close to 300B parameters models

Appoxo@lemmy.dbzer0.com · 4 months ago

Any source for that info? Seems important to know and assert the quality, no?

Saterz@lemmy.world · 4 months ago

Here:

https://www.sitepoint.com/local-llms-complete-guide/

https://www.hardware-corner.net/running-llms-locally-introduction/

https://travis.media/blog/ai-model-parameters-explained/

https://claude.ai/public/artifacts/0ecdfb83-807b-4481-8456-8605d48a356c

https://labelyourdata.com/articles/llm-fine-tuning/llm-model-size

https://medium.com/@prashantramnyc/understanding-parameters-context-size-tokens-temperature-shots-cot-prompts-gsm8k-mmlu-4bafa9566652

To find them it only required a web search using the query local llm parameters and number of params of cloud models on DuckDuckGo.

Edit: formatting

Appoxo@lemmy.dbzer0.com · 4 months ago

Appreciated. Very much appreciated!

SuspciousCarrot78@lemmy.world · edit-2 2 months ago

deleted by creator

jaykrown@lemmy.world · 4 months ago

Interesting, I tried it with DeepSeek and got an incorrect response from the direct model without thinking, but then got the correct response with thinking. There’s a reason why there’s a shift towards “thinking” models, because it forces the model to build its own context before giving a concrete answer.

Without DeepThink

With DeepThink

rockSlayer@lemmy.blahaj.zone · 4 months ago

It’s interesting to see it build the context necessary to answer the question, but this seems to be a lot of text just to come up with a simple answer

MojoMcJojo@lemmy.world · 4 months ago

Ai is not human. It does not think like humans and does not experience the world like humans. It is an alien from another dimension that learned our language by looking at text/books, not reading them.

Jyek@sh.itjust.works · 4 months ago

It’s dumber than that actually. LLMs are the auto complete on your cellphone keyboard but on steroids. It’s literally a model that predicts what word should go next with zero actual understanding of the words in their contextual meaning.

TubularTittyFrog@lemmy.world · 4 months ago

and a large chunk of human beings have no understanding of contextual meaning, so it seems like genius to them.

Car Wash Test on 53 leading AI models: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

Car Wash Test on 53 leading AI models: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

Opper