DeepSeek collects keystroke data and more, storing it in Chinese servers

@restingboredface@sh.itjust.works · 3 months ago

DeepSeek collects keystroke data and more, storing it in Chinese servers

@Imprint9816@lemmy.dbzer0.com · 3 months ago

Run it locally then

@horse_battery_staple@lemmy.world · 3 months ago

https://www.smartprix.com/bytes/how-to-run-deepseek-ai-locally-on-your-phone-2-methods/

@ZeDoTelhado@lemmy.world · edit-2 3 months ago

Thanks, managed to have it installed locally bia pocket pal (termux was giving me errors constantly on compile). Out of curiosity, I made a very “interesting” prompt, and frankly I am not even surprised

EDIT: decided to be a little spicier, didn’t fail to amuse me

@markinov@lemmygrad.ml · 3 months ago

@Grapho@lemmy.ml · 3 months ago

That so edgy, man. I bet all the girls in your high school think you’re the raddest.

@grey_maniac@lemmy.ca · 3 months ago

I’m confused. Isn’t “collecting keystroke data” just an alarmist way to describe text entry?

@noisefree@lemmy.world · 3 months ago

Maybe. They could also be doing things like paying attention to input cadence and typos/pre-send typo corrections to use as part of a fingerprint associated with the identifying information a user gives them when creating an account so that they can then attempt to detect the user elsewhere on the web whether they are using an identifying account or not.

☆ Yσɠƚԋσʂ ☆ · 3 months ago

This argument applies to literally every single web app you use.

@Melvin_Ferd@lemmy.world · 3 months ago

How far we’ve come

@uis@lemm.ee · 3 months ago

Not exactly. Timing between key presses can be used to identify people.

@grey_maniac@lemmy.ca · edit-2 3 months ago

I am literally so paranoid I regularly vary my keysteoke rhythms and explore polyrhytmic techniques to create variations. Not even joking.

@vfreire85@lemmy.ml · 3 months ago

this. i mean, the session logs for the prompt are kept at least for your user, right?

@Ferk@lemmy.ml · edit-2 3 months ago

This is the full paragraph:

We collect certain device and network connection information when you access the Service. This information includes your device model, operating system, keystroke patterns or rhythms, IP address, and system language. We also collect service-related, diagnostic, and performance information, including crash reports and performance logs. We automatically assign you a device ID and user ID. Where you log-in from multiple devices, we use information such as your device ID and user ID to identify your activity across devices to give you a seamless log-in experience and for security purposes.

It looks to me that they are using it to identify the user uniquely, maybe also related to captcha to prevent bots (it’s common practice to capture mouse and keyboard while resolving captchas to see if the movement is human-like).

@grey_maniac@lemmy.ca · 3 months ago

Looks like there are more things I need to start randomizing and injecting with noise.

@tux@lemmy.world · 3 months ago

Not usually. Keystroke info is different than text input, like if you didn’t click onto any field and typed it would only be captured if keystroke are all being grabbed. It’s especially scary if you keep the app running in the bg and then type something and it still captures it. Not saying they’re doing that, but the privacy policy says they might.

The rhythm part is annoying, it’s commonly used to ID people even through things like ad blocks and dns blocks. Could also (in theory) be used to capture what people are typing just by hearing how they type.

@AbouBenAdhem@lemmy.world · edit-2 3 months ago

Anyone using DeepSeek as a service the same way proprietary LLMs like ChatGPT are used is missing the point. The game-changer isn’t that a Chinese company like DeepSeek can compete with OpenAI and its ilk—it’s that, thanks to DeepSeek, any organization with a few million dollars to train and host their own model can now compete with OpenAI.

@WalnutLum@lemmy.ml · 3 months ago

Or open source groups can make a fully open repro of it: https://github.com/huggingface/open-r1

@naeap@sopuli.xyz · 3 months ago

I’d like to look into that, how can I train an existing model further?

I’m only playing around with ollama, but like to do a bit more - mostly just to fulfill my needs to understand things - but have no idea where to start

@WalnutLum@lemmy.ml · 3 months ago

You’re going to have to learn python.

Here’s a good overview: https://huggingface.co/docs/transformers/training

@naeap@sopuli.xyz · edit-2 3 months ago

Python is not a problem
SW Dev is my job. Just never had real contact with AI before, besides playing around a bit.

Thank you very much for the link!!

Edit: thank you very much again, that was pretty much exactly what I was looking for.
Don’t know how I missed to checkout huggingface. Thought of it always just as a github for models and didn’t bother checking for docs…
But that’s a great intro with simple tools/tutorials to get a grip on it, thanks!

@Treczoks@lemmy.world · 3 months ago

“We store the information we collect in secure servers located in the People’s Republic of China”

Now you Americans know how we Europeans feel when Google, Amazon and Facebook store our information on American servers. Hint: The protective wall between Chinese servers and their government are about as good as the one between American servers and their government - at least for non-US citizens. The last thin veil of privacy for Eurpeans has been ripped to shreds by Trump last week.

@Ferk@lemmy.ml · edit-2 3 months ago

The last thin veil of privacy for Eurpeans has been ripped to shreds by Trump last week.

What did he do? I know Trump does not like the GDPR, but did he sign something affecting it last week?

@Treczoks@lemmy.world · 3 months ago

He killed the EU-US Data Privacy Framework. Theoretically, no company is allowed to transfer data of European citizens to US-based servers anymore. Sadly, Ursula von der Leyen is lacking the balls to act on this.

@Ferk@lemmy.ml · edit-2 3 months ago

Thanks, I did not know. I think you are referring to this: https://www.freevacy.com/news/noyb/trumps-actions-to-dismantle-pclob-threatens-eu-us-data-transfers/6088

To be completely honest… as an European I would be happy if they actually did make it so that no EU-US data transfer were allowed… we need to stop depending on all these US-based services… but like you said, they probably don’t have the balls to pull the plug. Which makes me wonder if that board was actually really any protection at all for privacy or it had always been an empty shell used as an excuse on both sides just to keep up appearances and maintain the plug on.

I honestly think this could be a win for us. Worst case scenario, nothing really changes but some masks fall off and at least some people would stop acting under false pretense (which could open the doors for change). So I’m actually glad he did that.

Fuck Work · 3 months ago

At least its not stored on american servers.

@JonEFive@midwest.social · 3 months ago

I feel like Meta could do a ton more damage with my information than Tencent

ozoned · 3 months ago

Chinese company does what American companies have done for 25+ years now!

Is it time for REAL data privacy laws or are we just gonna keep playing whack-a-mole with Chinese tech companies that get us nowhere?

@Someonelol@lemmy.dbzer0.com · 3 months ago

Our data’s just too valuable for these parasites. Data privacy laws may eventually pass to compel software companies to store everything in US servers only.

ozoned · 3 months ago

Excellent Point. If that’s the case though, then wouldn’t other countries follow suit which still limits big tech’s reach and makes them less profitable and less powerful? Idk. Guess we’ll see how it plays out. Either way, I’m staying as far from those ecosystems as possible to at least try to mitigate some of what they do. I’ll never be totally successful, genie is put of the bottle, but we can at least attempt.

Subverb · 3 months ago

If you think the American companies do anything different you’re not paying attention and simply believing the propaganda.

@mavu@discuss.tchncs.de · 3 months ago

It’s a chinese company, where else would they store the data?

@ShinkanTrain@lemmy.ml · 3 months ago

The balls.

@smb@lemmy.ml · 3 months ago

I think its called a data lake, so they don’t “store” it, its rather floating around there 🤪

@howrar@lemmy.ca · 3 months ago

These lakes are formed when the cloud is saturated and gives us data precipitation.

@smb@lemmy.ml · edit-2 3 months ago

thanks for the great picture 👍

so here is the current cloud clima forecast:

The saturated clouds will rain into the data lakes that are already overspilling here and there into the ransomstreams already taking all soil in their way with them. During the day there will be security clouds preventing from visible rain only while during the night those same security clouds rain themselves all collected data to their homelake while their homelake security already is corrupted and spills over regulary.

As soon as the fort-cisc-pal-ocstricken-redm-ondams breach it’ll gonna have floods with multi-exabyte waveheights and the ripples of the release will be felt over to far east china and the currents will circulate around the world multiple times causing damage and devastation in their wake around the world and eventually even reach connected orbit.

The floods will have the potential to also wash away and /or drown or choke all the big tech dinosaurs. Only small foss mammals and deep sea amphibics will survive this historic event.

… you kinda asked for it 😉 same as “they” kinda asked for it too. 🤔

@Critical_Thinker@lemm.ee · 3 months ago

Antarctica, clearly.

ZeroOne · 3 months ago

Nope, At least we can check DeepSeek’s source code

Unlike OpenAI… oops I meant ClosedAI

@JOMusic@lemmy.ml · 3 months ago

This article is what US propaganda looks like folks. Mashable should be ashamed.

Literally all AI companies do this to run their services. Except you can actually download Deepseek and run it completely securely on your own devices. You know who doesn’t allow that security? OpenAI and the other US companies currently being screwed.

@zeca@lemmy.eco.br · 3 months ago

every google site has been doing this for years too. every comment we write in youtube and discard before posting, its being recorded. this isnt news at all.

@GrumpyDuckling@sh.itjust.works · 3 months ago

Your devices keyboard app has been collecting all of your keystrokes.

Preston Maness ☭ · 3 months ago

Billions of folk’s keyboards are connected to the internet and the vast majority of them have no idea. It’s absolutely ludicrous that we’ve gotten to this stage with surveillance capitalism. Internet-connected keyboards are malware, plain and simple.

ArchRecord · 3 months ago

the company states that it may share user information to "comply with applicable law, legal process, or government requests.

Literally every company’s privacy policy here in the US basically just says that too.

Not only does DeepSeek collect “text or audio input, prompt, uploaded files, feedback, chat history, or other content that [the user] provide[s] to our model and Services,” but it also collects information from your device, including “device model, operating system, keystroke patterns or rhythms, IP address, and system language.”

Breaking news, company with chatbot you send messages to uses and stores the messages you send, and also does what practically every other app does for demographic statistics gathering and optimizations.

Companies with AI models like Google, Meta, and OpenAI collect similar troves of information, but their privacy policies do not mention collecting keystrokes. There’s also the added issue that DeepSeek sends your user data straight to Chinese servers.

They didn’t use the word keystrokes, therefore they don’t collect them? Of course they collect keystrokes, how else would you type anything into these apps?

In DeepSeek’s privacy policy, there’s no mention of the security of its servers. There’s nothing about whether data is encrypted, either stored or in transmission, and zero information about safeguards to prevent unauthorized access.

This is the only thing that seems disturbing to me, compared to what we’d like to expect based on the context of what DeepSeek is. Of course, this was proven recently in practice to be terrible policy, so I assume they might shore up their defenses a bit.

All the articles that talk about this as if it’s some big revelation just boil down to “company does exactly what every other big tech company does in America, except in China”

@tux@lemmy.world · 3 months ago

Collecting keystrokes is very different from collecting text inputted into fields. Keystroke rhythms is even more alarming as that is often used to identify users despite them using privacy settings, or used to collect what’s typed via audio collection.

Your argument that this is no different than other apps is complete crap. Don’t trust any app that collects that information

@Ferk@lemmy.ml · edit-2 3 months ago

The argument stands, though.

Yes, not ALL other apps do that, but the comment was specifically talking about companies like Google and Meta… they definitely do collect incomplete strings from search forms (down to individual characters) when they display search suggestions, for example. They might not mention “keystrokes” in the legal text, but I don’t see why they wouldn’t be able to extrapolate your typing pattern since they do have the timing information which should be enough data to, at some level, profile it.

@tux@lemmy.world · 3 months ago

Keystrokes don’t have to be in a text field or input. That’s my point.

If I’m on say google. And I type anything into the field it’s definitely capturing it. You know this for no other reason then it would have to be with autocomplete as an option.

Keystroke capturing is the same as keylogging, aka anything typed even if it’s not into a place where you would assume it’s being seen by the app. Aka, if I had an app open in the background and was typing in my password, it would see and capture that.

They’re completely different things. While the privacy issues of US large tech companies are abundant and awful, there is a large difference between keystroke capturing and capturing input via fields. Especially when you’re agreeing to allow them to process and transfer or even sell that information.

@Ferk@lemmy.ml · edit-2 3 months ago

But that’s not what the terms on both Google/Meta and Deepseek say.

There’s no term in their ToS saying Google/Meta restricts the data collection to forms, which means that if the ToS allowed them to collect them from forms (and as you admitted, we do know for a fact that they do), then it also allows them to collect it outside of forms. The reason I put the search suggestions as example is because it’s one we CAN know (and thank you for agreeing on that), but that doesn’t mean they don’t do other captures at times we DON’T know… and also it’s not the only place, Google owns several captcha mechanisms and capturing input patterns is common on those too (and captchas capture outside forms too!). Another obvious example is Google docs, another is Google translate… and again, those are only the obvious ones, we don’t know if there are non-obvious ones.

In the other direction too, Deepseek terms don’t say it does it outside of forms either. You are jumping into assumptions by saying it acts the same as a traditional keylogger and that the keystrokes are captured for “anything typed”. For all we know the only place they might be capturing is when the user is in very specific steps of the login process, maybe for captcha purposes too, or specific forms for preloading results, etc. There’s no reason you should trust they do it any less/more than Google/Meta does, the ToS in both have the same lack of information in that respect.

You can only make assumptions one way or the other, since the terms are not specific on what exactly they allow themselves to do, in the case of Google/Meta they’re so sneaky that they avoid saying they do capture them (even though they do, as you yourself admitted), while in the case of Deepseek, even though they are a bit more specific by using the word “keystrokes”, they also don’t specify where/when/why (other than “to give you a seamless log-in experience and for security purposes” …but that’s also unclear wording).

@leanleft@lemmy.ml · 3 months ago

other ai services do too. u might not realize it.

Mojeek Search Engine · 3 months ago

haha, now do openai

The 8232 Project · 3 months ago

HuggingChat is open source and lets you use DeepSeek. It also doesn’t censor results like the main app (allegedly) does.

@Ledivin@lemmy.world · 3 months ago

Nothing alleged about it. The main app wraps your prompt in a China-friendly one - at this point, I think people have mined the prompt itself? Scummy, sure, but it’s also the same way that literally every other online AI service works.

@frozenspinach@lemmy.ml · edit-2 3 months ago

Nothing alleged about it. The main app wraps your prompt in a China-friendly one

I asked it about whether the takeover of Hong Kong was met with international criticism. First I saw an answer saying yes, and a few paragraphs of examples and elaborations.

A few minutes later the answer I already saw was replaced with “sorry, that’s outside of my scope.” I think with the flood of new traffic to Deepseek, they are scaling up reviews of chat content.

@frozenspinach@lemmy.ml · edit-2 3 months ago

at this point, I think people have mined the prompt itself?

Would be interested in any additional info on this.

@Sir_Kevin@lemmy.dbzer0.com · 3 months ago

This is amazing!

@zifk@sh.itjust.works · 3 months ago

Annoyingly, some of the censorship is baked into the model, it still won’t answer all question about china.

@simple@lemm.ee · 3 months ago

HuggingChat is open source and lets you use DeepSeek.

Very misleading, it lets you use the lighter, watered-down version (Deepseek 32B) compared to the large impressive model they have (Deepseek 671B)