jmschrei/apricot

apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly. See the documentation page: https://apricot-select.readthedocs.io/en/latest/index.html

49
/ 100
Emerging

When working with massive datasets, this tool helps data scientists, machine learning engineers, and researchers summarize them into smaller, representative subsets. You input your large dataset, and it outputs a curated subset that still reflects the original data's diversity and characteristics. This is useful for faster model training and data visualization.

528 stars.

Use this if you need to reduce the size of a very large dataset to speed up machine learning model training or make data exploration and visualization more manageable, without losing critical information.

Not ideal if your dataset is already small or if you need to select a subset based on very specific, pre-defined criteria rather than overall data representation.

data-summarization machine-learning-optimization dataset-curation data-visualization big-data-sampling
No Package No Dependents
Maintenance 6 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 17 / 25

How are scores calculated?

Stars

528

Forks

52

Language

Jupyter Notebook

License

MIT

Last pushed

Nov 17, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/jmschrei/apricot"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.