OneInterface/realtime-bakllava
llama.cpp with BakLLaVA model describes what does it see
This project helps you understand what is happening in an image or real-time video feed by providing textual descriptions. You feed it a picture or live webcam stream, and it tells you what it "sees" by generating natural language captions. Anyone who needs immediate, descriptive insights from visual information can use this, such as those in accessibility roles or content analysis.
379 stars. No commits in the last 6 months.
Use this if you need a local, real-time solution to generate descriptive text from images or a live webcam feed on an Apple silicon chip.
Not ideal if you need a cloud-based solution, require broad cross-platform support beyond Apple silicon, or need highly specialized object detection.
Stars
379
Forks
41
Language
Python
License
—
Category
Last pushed
Nov 08, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/OneInterface/realtime-bakllava"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ludwig-ai/ludwig
Low-code framework for building custom LLMs, neural networks, and other AI models
withcatai/node-llama-cpp
Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema...
mudler/LocalAI
:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and...
zhudotexe/kani
kani (カニ) is a highly hackable microframework for tool-calling language models. (NLP-OSS @ EMNLP 2023)
SciSharp/LLamaSharp
A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.