Mozilla Firefox new alt-text generator powered by "fully private on-device AI model"

frogman [he/him] · edit-2 1 year ago

Mozilla Firefox new alt-text generator powered by "fully private on-device AI model"

@ColdWater@lemmy.ca · 1 year ago

Babe another pointless Al just dropped

SSUPII · 1 year ago

Think AI is pointless when it doesn’t apply to you?

@Zworf@beehaw.org · 1 year ago

If you had a visual disability you would certainly think otherwise.

@grrgyle@slrpnk.net · 1 year ago

Tell me you don’t add alt text to your posts without telling me :p

@pr06lefs@lemmy.ml · 1 year ago

I like this approach of having a model locally and running it locally. I’ve been using the firefox website translator and its great. Handy and it doesn’t send my data to google. That I know of, ha.

@ClassifiedPancake@discuss.tchncs.de · 1 year ago

When I used a similar feature in Ice Cubes (Mastodon app) it generated very detailed but ultimately useless text because it does not understand the point of the image and focuses on things that don’t matter. Could be better here but I doubt it. I prefer writing my own alt text but it’s better than nothing.

@jherazob@beehaw.org · 1 year ago

Now i want this standalone in a commandline binary, take an image and give me a single phrase description (gut feeling says this already exists but depending on Teh Cloudz and OpenAI, not fully local on-device for non-GPU-powered computers)

umami_wasabi · 1 year ago

Ollama + llava-llama3

You now just need a cli wrapper interact with the ollama api

@jherazob@beehaw.org · 1 year ago

So, it’s possible to build but no one has made it yet? Because i have negative interest in messing with that kinda tech, and would rather just “apt-get install whatever-image-describing-gizmo” so i wouldn’t be the one who does it

@Swedneck@discuss.tchncs.de · 1 year ago

this is how i feel about basically all technology nowadays, it’s all so artificially limited by capitalism.

nothing fucking progresses unless someone figures out a way to monetize it or an autistic furry decides to revolutionize things in a weekend because they were bored and inventing god was almost stimulating enough

The Doctor · 1 year ago

Folks have made it - I think ollama was name-checked specifically because it’s on Github and in Homebrew and in some distros’ package repositories (it’s definitely in Arch’s). I think some folks (at least) aren’t talking about it because of the general hate-on folks have for LLMs these days.

@jherazob@beehaw.org · 1 year ago

I don’t want an LLM to chat with or whatever folks do with those things, i want a command i can just install, i call the binary on a terminal window with an image of some sort as a parameter, it returns a single phrase describing the image, on a typical office machine with no significant GPU and zero internet access.

Right now i cannot do this as far as i know. Pointing me at some LLM and “Go build yourself something with that” is the direct opposite of what i stated that i desire. So, it doesn’t currently seem to exist, that’s why i stated that i wished somebody ripped it off the Firefox source and made it a standalone command.

umami_wasabi · edit-2 1 year ago

And you expect someone just do it for you? You alrady get the inferencing engine and the model for free mate.

The Doctor · 1 year ago

I thought that feature was built into it, but okay.

@Zworf@beehaw.org · 1 year ago

Yes I was just writing that, I would love to see more integrations that can talk against ollama.

@Kissaki@beehaw.org · edit-2 1 year ago

From your OP description:

EDIT: the AI creates an initial description, which then receives crowdsourced additional context per-image to improve generated output. look for the “Example Output” heading in the article.

That’s wrong. There is nothing crowd sourced. What you read in the article is that when you add an image in the PDF editor it can generate an alt text for the image, and you as a user validate and confirm it. That’s still local PDF editing though.

The caching part is about the model dataset, which is static.

frogman [he/him] · 1 year ago

my bad, i misunderstood. thanks.

@Kissaki@beehaw.org · edit-2 1 year ago

So, planned experimentation and availabiltiy

PDF editor when adding an image in Firefox 130
PDF reading
[hopefully] general web browsing

Sounds like a good plan.

Once quantized, these models can be under 200MB on disk, and run in a couple of seconds on a laptop – a big reduction compared to the gigabytes and resources an LLM requires.

While a reasonable size for Laptop and desktop, the couple of seconds time could still be a bit of a hindrance. Nevertheless, a significant unblock for blind/text users.

I wonder what it would mean for mobile. If it’s an optional accessibility feature, and with today’s smartphones storage space I think it can work well though.

Running inference locally with small models offers many advantages:

They list 5 positives about using local models. On a blog targeting developers, I would wish if not expect them to list the downsides and weighing of the two sides too. As it is, it’s promotional material, not honest, open, fully informing descriptions.

While they go into technical details about the architecture and technical implementation, I think the negatives are noteworthy, and the weighing could be insightful for readers.

So every time an image is added, we get an array of pixels we pass to the ML engine

~~An array of pixels doesn’t make sense to me. Images can have different widths, so linear data with varying sectioning content would be awful for training.~~

~~I have to assume this was a technical simplification or unintended wording mistake for the article.~~

@grrgyle@slrpnk.net · 1 year ago

I imagine it’s a 2D array? So width would be captured by uhh like a[N].len.

It could be I’m misunderstanding you, because not not sure what you mean by:

linear data with varying sectioning content

@Kissaki@beehaw.org · 1 year ago

Looking at Wikipedia on arrays, I think I’m just not used to array as terminology for multi-dimensional data structures. TIL

@pheet@sopuli.xyz · 1 year ago

Might be a significant issue if more applications adopt these kind of festures and can’t share the resources in a meaningful way.

@Even_Adder@lemmy.dbzer0.com · 1 year ago

I hope this’ll be useful for me. I wonder how it compares to LLaVA?

@Zworf@beehaw.org · 1 year ago

One thing I’d love to see in Firefox is a way to offload the translation engine to my local ollama server. This way I can get much better translations but still have everything private.

@leanleft@lemmy.ml · 11 months ago

There are way more companies who want to text-mine user content than there are blind people using the internet to read my content.

Mozilla Firefox new alt-text generator powered by "fully private on-device AI model"

Mozilla Firefox new alt-text generator powered by "fully private on-device AI model"

Experimenting with local alt text generation in Firefox Nightly – Mozilla Hacks - the Web developer blog