OneInterface/realtime-bakllava

llama.cpp with BakLLaVA model describes what does it see

/ 100

Emerging

This project helps you understand what is happening in an image or real-time video feed by providing textual descriptions. You feed it a picture or live webcam stream, and it tells you what it "sees" by generating natural language captions. Anyone who needs immediate, descriptive insights from visual information can use this, such as those in accessibility roles or content analysis.

379 stars. No commits in the last 6 months.

Use this if you need a local, real-time solution to generate descriptive text from images or a live webcam feed on an Apple silicon chip.

Not ideal if you need a cloud-based solution, require broad cross-platform support beyond Apple silicon, or need highly specialized object detection.

visual-assistance image-captioning video-description accessibility-tech content-analysis

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 16 / 25

How are scores calculated?

Stars

379

Forks

Language

Python

License

—

Higher-rated alternatives

ludwig-ai/ludwig

Low-code framework for building custom LLMs, neural networks, and other AI models

withcatai/node-llama-cpp

Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema...

mudler/LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and...

zhudotexe/kani

kani (カニ) is a highly hackable microframework for tool-calling language models. (NLP-OSS @ EMNLP 2023)

SciSharp/LLamaSharp

A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.

Explore Transformer Models

All categories Trending Transformer directory Insights