jmschrei/apricot
apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly. See the documentation page: https://apricot-select.readthedocs.io/en/latest/index.html
When working with massive datasets, this tool helps data scientists, machine learning engineers, and researchers summarize them into smaller, representative subsets. You input your large dataset, and it outputs a curated subset that still reflects the original data's diversity and characteristics. This is useful for faster model training and data visualization.
528 stars.
Use this if you need to reduce the size of a very large dataset to speed up machine learning model training or make data exploration and visualization more manageable, without losing critical information.
Not ideal if your dataset is already small or if you need to select a subset based on very specific, pre-defined criteria rather than overall data representation.
Stars
528
Forks
52
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Nov 17, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/jmschrei/apricot"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
feature-engine/feature_engine
Feature engineering and selection open-source Python library compatible with sklearn.
alteryx/featuretools
An open source python library for automated feature engineering
cod3licious/autofeat
Linear Prediction Model with Automated Feature Engineering and Selection Capabilities
abess-team/abess
Fast Best-Subset Selection Library
rodrigo-arenas/Sklearn-genetic-opt
ML hyperparameters tuning and features selection, using evolutionary algorithms.