Disclaimer: I wrote this article and made this website.
There was some talk of this issue in the recent fediverse inefficiencies thread. I’m hopeful that in the future we’ll have a decentralized solution for file hosting but for now I deeply believe that users should pay for their own file hosting.
Ok, hear me out.
We find the users with the slowest internet and start sending them all the data. They don’t have to keep anything on disk. Then they send it all back and forth between each other. Any time a user makes a request, we just wait for one of the slow nodes to come across the data and send it out.
We use the slowest wires for all the storage. It’s fool proof.
Somebody actually did make this as a joke years ago haha https://github.com/yarrick/pingfs
I was brushing my teeth when reading this comment and inadvertently ended up swallowing all my toothpaste.
don’t forget to spit
Ha! That’s awesome!
This is amazing and I love it.
Top 5 YouTube videos this one
Too wet for server racks in the forest.
They grew there
Look I know there called “farms” but like I told the last forest gnome, the dank woods is no place to host data.
anyway to use torrent protocol somehow? Like popcorn time did?
Ipfs would be similar but more purpose based
Never heard of that for, it looks exactly built for this problem, better than torrent. Good call.
this is actually a super interesting idea of which i have never seen proof of concept. other than maybe freenet or something.
if it was easy to set up id definitely host a terabyte of Lemmy. As a pet.
What would you name your pet
IPFS?
What would an IPFS solution look like here? That’s a genuine question. I don’t have much experience with IPFS. It seems like it isn’t really used outside of blockchain applications.
The sustainability of it is questionable. If I’m not mistaken, IPFS is based on Ethereum, which has gone over to proof of stake rather than proof of work, but it’s still a pretty cumbersome system.
We’re talking about something that needs to compete with Quic and CloudFlare. I’m not sure that Ethereum or even crypto itself is efficient enough as a content delivery method, that IPFS - though a nice idea - is unrealistic.
But that’s just speculation from someone who has zero knowledge behind IPFS as a technology and protocol, so take it with a grain of salt.
EDIT: honestly, why qualify with “I’m not sure” when besserwissers and their alts roam the fediverse instead of going to therapy. Smh. Give the people a Tl;Dr at least. I’m not here for long form content.
IPFS has absolutely nothing whatsoever to do with Ethereum, or indeed any blockchain. It is a protocol for storing distributing and addressing data by hashes of the content over a peer to peer network.
There is however an initiative to create a commercial market for “pinning*”, which is blockchain based. It still has nothing to do with Ethereum, and is a distinct project that uses IPFS rather than being part of the protocol, thankfully. It is also not a “proof of work” sort of waste, but built around proving content that was promised to be stored is actually stored.
Pinning in IPFS is effectively “hosting” data permanently. IPFS is inherently peer to peer: content you access gets added to your local cache and gets served to any peer near you asking for it—like BitTorrent—until it that cache is cleared to make space for new content you access. If nobody keeps a copy of some data you want others to access when your machines are offline, IPFS wouldn’t be particularly useful as a CDN. So peers on the network can choose to pin some data, making them exempt from being cleared with cache. It is perfectly possible to offer pinning services that have nothing to do with Filecoin or the blockchain, and those exist already. But the organization developing IPFS wanted an independent blockchain based solution simply because they felt it would scale better and give them a potential way to sustain themselves.
Frankly, it was a bad idea then, as crypto grift was already becoming obvious. And it didn’t really take off. But since Filecoin has always been a completely separate thing to IPFS, it doesn’t affect how IPFS works in any way, which it continues to do so.
There are many aspects of IPFS the actual protocol that could stand to be improved. But in a lot of ways, it does do many of the things a Fediverse “CDN” should. But that’s just the storage layer. Getting even the popular AP servers to agree to implement IPFS is going to be almost as realistic an expectation as getting federated identity working on AP. A personal pessimistic view.
TL;Dr. From Wikipedia
IPFS allows users to host and receive content in a manner similar to BitTorrent. As opposed to a centrally located server, IPFS is built around a decentralized system of user-operators who hold a portion of the overall data. Any user in the network can serve a file by its content address, and other peers in the network can find and request that content from any node who has it using a distributed hash table (DHT).
So it’s BitTorrent in the web browser… thanks. How is that to be competitive with CloudFlare and Quic again? It has the same network issues that the blockchain has, in that it will be cumbersome and slow - for anyone else that doesn’t have millions to throw into infrastructure. Welcome to the same problem again, but in a different way.
Ironically, because there’s no UDP in browsers, we can’t actually get proper p2p on the web. WebRTC through centralized coordination servers at best. Protocol Labs has all but given up on this use-case in favor of using some bootstrapped selection of remote helper nodes.
Have you considered providing something like this: https://jortage.com/ and maybe contribute to their efforts to develop a specific API for that? Source code is here: https://github.com/jortage
Jortage is a really interesting approach. It definitely helps reduce the impact of the file hosting problem but it doesn’t fully address the underlying cost issue. The cost of storing files grows every month indefinitely while donations typically don’t.
I would like to see a file hosting pool come to lemmy though. So I will look into it. :)
Pict-rs that is used by Lemmy to store images already supports S3 type storage, so in theory it should work with Jortage, but I don’t think anybody has tested that yet. The people behind Feddit.org might have experimented with it as they expressed interest a while back.
I think the major advantage is the deduplication - when an image goes viral across Mastodon (or Lemmy) it’s currently stored hundreds or thousands of times, each with its own cost. Do you dedupe (for either your customers’ benefit or your own)?
Are the images duplicated when shared? My understanding is that only a link to the file is replicated across servers and duplication comes from users manually uploading the same file to another server.
My website does not do any deduplication at this time.
Yes, for example go to https://infosec.exchange/explore
I see the top post as https://infosec.exchange/@nocontexttrek@mastodon.social/113433063621462027 and the image is https://media.infosec.exchange/infosec.exchange/cache/media_attachments/files/113/433/063/582/671/258/original/71da3801e4e4f08c.png
The link is to the original on https://files.mastodon.social/media_attachments/files/113/433/062/676/773/993/original/f828afef5cc7ed1c.png but when you click image the javascript loads a modal with the local cached version (same image as the thumbnail that infosec.exchange loads.
There’s lots of different codebases across the fediverse so perhaps some hotlink, but local copies is the default.
The Lemmy server config indicates that is an optional setting to improve user privacy so requests don’t ever hit the original server from the client. Those cached files are only temporary and will be deleted after some time. So it’s not really full blown duplication.
The default setting is to only generate the thumbnails and store those locally (indefinitely?) but even that can be turned off. I checked and it appears that lemmy.world has the thumbnail generation disabled so all images from other instances just link to the original on that instance.
Ok, so Lemmy doesn’t cause the same amount of duplication, but I’d still argue that dedupe is valuable: it saves on hosting costs (your costs, in this case) and users will get a small advantage in having slightly higher cache hits.
For sure, I’ll add it to the list. :)
Is file hosting really a must? I mean Reddit and feddit are basically forums. And not many forums allow file uploads. Also, we should have retention limits. Low value posts are allowed to fade away. High value posts that have some level of interaction stay alive longer.
Reddit is basically entirely image or video posts, all hosted by reddit directly.
This feels like something the Fediverse is ultimately going to build for itself. I know jack squat about the details, but it’s gonna have to be a thing eventually, I think.
I wish there was some version of PBS for Lemmy, like public funds for hosting. I’ll admit I haven’t really thought this through, so there’s probably some problems with my idea.
What is stopping some big giant, let’s say Yahoo/Verizon from buying a shitload of storage, starting their own private instance which is open to the public, but private in the sense that only Verizon employees are admins and mods. Only Verizon controls things. Then advertise to the point that the average person on the street knows that Verizon.Lemmy exists, and assosiates Lemmy with being a Verizon thing? What is stopping big tech from pouring the money required for this concept to take off, and using their control over their instance from making the decentralized a centralized service in the general public’s minds?
Right now Lemmy is 60k people. Ok. What if Lemmy was 200 million people, and only 60k knew it was a decentralized service? Everyone else just thought Verizon owned Lemmy?
This is literally exactly what happened to email. It didn’t go great
I think Tenfingers could be an interesting option as hosters do not know what they host, the data can be modified, and it’s 100% decentralised.
Is a p2p system for media with the instances just hosting magnet links too slow for fediverse purposes? To me this seems like the most resilient way to handle media in a decentralized system
If a social network is to take off, it must be accessible from mobile devices behind CGNAT (carrier grade network address translation).
p2p from behind a CGNAT works just fine as long as a single server is accessible and can mediate connections between other peers. Most non-servers are behind some sort of NAT these days.
Interesting approach, good luck! Admittedly I’m not sure if many users want to take their media uploading in their own hands and pay for it but maybe I’m wrong. Where are the images stored? Do you have your own hardware? Backups etc?
Also since you’re interested in Fediverse media storage, I recently read about https://jortage.com/ It’s a third party storage for your instance with deduplication, pretty interesting idea. Takes away a bit of the federated part though
Thank you for writing this. Small typo: focued (focused).
Thanks for reading and pointing out that typo! (I fixed it)
I found this https://github.com/Marie673/Torrent_Proxy
Its in chinese so idk if its whats needed but if each instance also acted as a torrent proxy then thats decentralised domain agnostic file hosting that doesnt break frontends but also allows clients to update to do resource resolution themselves while helping to serve the fediverses files as a whole.
am I the only one who would advocate for text only storage: no. This comment also gets my point? There should be text only lemmy instances which only save text and do not allow any kind of image posting or storage.
- text is less offensive to consume and moderate than evil images or video
- text is way more information dense and can be even compressed more! Truly the green biosphere friendly data format. I would be willing to save text only data of strangers on my hard-drive, but not images or video. Could even be valuable llm analysis training data.
Yes people could post base64 encoded images, but that is a larger technical barrier and can be detected. If image storage is really need, images should be heavily compressed (webp 90% quality loss), provided as links to external sites, and whenever possible svg / vector graphics should be preferred.