somewheresystems/dataclysm

Pull high-quality, efficient embeddings for PubMed, arXiv and Wikipedia from Huggingface and use for local LLM inference/Retrieval Augmented Generation (RAG)

/ 100

Experimental

This tool helps researchers and knowledge workers explore vast scientific and general knowledge databases like PubMed, arXiv, and Wikipedia. You provide a search query, and it returns highly relevant articles and summaries. It's designed for anyone needing to quickly find and understand information from large academic or informational text collections.

No commits in the last 6 months.

Use this if you need to efficiently search and summarize information across millions of academic papers or Wikipedia articles.

Not ideal if you are looking to analyze very short texts or data outside of research papers and general encyclopedic content.

scientific-research literature-review knowledge-discovery information-retrieval academic-search

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 5 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

Apache-2.0

Higher-rated alternatives

denser-org/denser-retriever

An enterprise-grade AI retriever designed to streamline AI integration into your applications,...

rayliuca/T-Ragx

Enhancing Translation with RAG-Powered Large Language Models

neuml/rag

🚀 Retrieval Augmented Generation (RAG) with txtai. Combine search and LLMs to find insights with...

NovaSearch-Team/RAG-Retrieval

Unify Efficient Fine-tuning of RAG Retrieval, including Embedding, ColBERT, ReRanker.

RulinShao/retrieval-scaling

Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".

Explore RAG Tools

All categories Trending RAG directory Insights