JIA-Lab-research/Q-LLM

This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"

/ 100

Experimental

This project helps you get accurate answers from Large Language Models (LLMs) when working with very long documents, like entire books or extensive research papers. It takes a long document and your specific question, then uses an LLM to quickly find and summarize the most relevant information to provide a precise answer. This is ideal for researchers, analysts, or anyone who needs to extract detailed insights from massive texts without manually sifting through them.

No commits in the last 6 months.

Use this if you frequently need to query and get specific answers from extremely long text documents using an LLM, without sacrificing accuracy or waiting a long time.

Not ideal if your primary use case involves short, conversational interactions with an LLM or if you are not working with lengthy documents that exceed typical LLM context windows.

document-analysis information-retrieval research-assist text-summarization knowledge-extraction

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 8 / 25

Community 9 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

quic/efficient-transformers

This library empowers users to seamlessly port pretrained models and checkpoints on the...

ManuelSLemos/RabbitLLM

Run 70B+ LLMs on a single 4GB GPU — no quantization required.

alpa-projects/alpa

Training and serving large-scale neural networks with auto parallelization.

arm-education/Advanced-AI-Hardware-Software-Co-Design

Hands-on course materials for ML engineers to master extreme model quantization and on-device...

IST-DASLab/marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes...

Explore Transformer Models

All categories Trending Transformer directory Insights