somewheresystems/dataclysm
Pull high-quality, efficient embeddings for PubMed, arXiv and Wikipedia from Huggingface and use for local LLM inference/Retrieval Augmented Generation (RAG)
This tool helps researchers and knowledge workers explore vast scientific and general knowledge databases like PubMed, arXiv, and Wikipedia. You provide a search query, and it returns highly relevant articles and summaries. It's designed for anyone needing to quickly find and understand information from large academic or informational text collections.
No commits in the last 6 months.
Use this if you need to efficiently search and summarize information across millions of academic papers or Wikipedia articles.
Not ideal if you are looking to analyze very short texts or data outside of research papers and general encyclopedic content.
Stars
47
Forks
2
Language
Jupyter Notebook
License
Apache-2.0
Category
Last pushed
Feb 16, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/somewheresystems/dataclysm"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
denser-org/denser-retriever
An enterprise-grade AI retriever designed to streamline AI integration into your applications,...
rayliuca/T-Ragx
Enhancing Translation with RAG-Powered Large Language Models
neuml/rag
🚀 Retrieval Augmented Generation (RAG) with txtai. Combine search and LLMs to find insights with...
NovaSearch-Team/RAG-Retrieval
Unify Efficient Fine-tuning of RAG Retrieval, including Embedding, ColBERT, ReRanker.
RulinShao/retrieval-scaling
Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".