jmschrei/apricot

apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly. See the documentation page: https://apricot-select.readthedocs.io/en/latest/index.html

/ 100

Emerging

When working with massive datasets, this tool helps data scientists, machine learning engineers, and researchers summarize them into smaller, representative subsets. You input your large dataset, and it outputs a curated subset that still reflects the original data's diversity and characteristics. This is useful for faster model training and data visualization.

528 stars.

Use this if you need to reduce the size of a very large dataset to speed up machine learning model training or make data exploration and visualization more manageable, without losing critical information.

Not ideal if your dataset is already small or if you need to select a subset based on very specific, pre-defined criteria rather than overall data representation.

data-summarization machine-learning-optimization dataset-curation data-visualization big-data-sampling

No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 17 / 25

How are scores calculated?

Stars

528

Forks

Language

Jupyter Notebook

License

MIT

Higher-rated alternatives

feature-engine/feature_engine

Feature engineering and selection open-source Python library compatible with sklearn.

alteryx/featuretools

An open source python library for automated feature engineering

cod3licious/autofeat

Linear Prediction Model with Automated Feature Engineering and Selection Capabilities

abess-team/abess

Fast Best-Subset Selection Library

rodrigo-arenas/Sklearn-genetic-opt

ML hyperparameters tuning and features selection, using evolutionary algorithms.

Explore ML Frameworks

All categories Trending ML Framework directory Insights