Llm Domain Datasets Embedding Tools

There are 9 llm domain datasets tools tracked. The highest-rated is ewok-core/ewok-paper at 36/100 with 7 stars.

Get all 9 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=embeddings&subcategory=llm-domain-datasets&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 ewok-core/ewok-paper

Elements of World Knowledge! This repository houses data and code needed to...

36
Emerging
2 itrummer/thalamusdb

ThalamusDB: semantic query processing on multimodal data

35
Emerging
3 texttron/hyde

HyDE: Precise Zero-Shot Dense Retrieval without Relevance Labels

33
Emerging
4 ArslanKAS/Large-Language-Models-with-Semantic-Search

Explore from keyword search to dense retrieval and reranking, which injects...

31
Emerging
5 Ahren09/SciEvo

A longitudinal dataset for academic literature, including papers, metadata,...

29
Experimental
6 jzhoubu/vsearch

An Extensible Framework for Retrieval-Augmented LLM Applications: Learning...

26
Experimental
7 infosenselab/frameref

Large-scale dataset and simulation framework for studying information health.

21
Experimental
8 KRR-Oxford/LM-ontology-concept-placement

Language Model based ontology concept placement

21
Experimental
9 nnliu1/sem_annotation

ontology term recommendation system for semantic annotation

13
Experimental