pinecone-io/pinecone-datasets

An open-source dataset library for pre-embedded dataset: create your own data catalog, or use Pinecone's public datasets.

51
/ 100
Established

This project helps data scientists, machine learning engineers, and developers who work with vector databases to easily access and manage pre-embedded datasets. You can load existing public datasets, which contain vectorized text or other data, and use them to quickly populate a vector index. This allows for rapid prototyping and testing of semantic search or recommendation systems.

Use this if you need to quickly access and load pre-embedded datasets for use with a vector database like Pinecone, or if you want to create and manage your own catalog of vector datasets.

Not ideal if you need to generate embeddings from raw text or other data yourself, or if you are working with traditional relational databases.

semantic-search vector-databases machine-learning-datasets information-retrieval data-cataloging
No Package No Dependents
Maintenance 10 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 18 / 25

How are scores calculated?

Stars

34

Forks

14

Language

Python

License

Apache-2.0

Last pushed

Mar 12, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/pinecone-io/pinecone-datasets"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.