JIA-Lab-research/Q-LLM
This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"
This project helps you get accurate answers from Large Language Models (LLMs) when working with very long documents, like entire books or extensive research papers. It takes a long document and your specific question, then uses an LLM to quickly find and summarize the most relevant information to provide a precise answer. This is ideal for researchers, analysts, or anyone who needs to extract detailed insights from massive texts without manually sifting through them.
No commits in the last 6 months.
Use this if you frequently need to query and get specific answers from extremely long text documents using an LLM, without sacrificing accuracy or waiting a long time.
Not ideal if your primary use case involves short, conversational interactions with an LLM or if you are not working with lengthy documents that exceed typical LLM context windows.
Stars
55
Forks
4
Language
Python
License
—
Category
Last pushed
Jul 16, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/JIA-Lab-research/Q-LLM"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
quic/efficient-transformers
This library empowers users to seamlessly port pretrained models and checkpoints on the...
ManuelSLemos/RabbitLLM
Run 70B+ LLMs on a single 4GB GPU — no quantization required.
alpa-projects/alpa
Training and serving large-scale neural networks with auto parallelization.
arm-education/Advanced-AI-Hardware-Software-Co-Design
Hands-on course materials for ML engineers to master extreme model quantization and on-device...
IST-DASLab/marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes...