JacksonBurns/astartes

Better Data Splits for Machine Learning

/ 100

Emerging

When building machine learning models, accurately splitting your dataset into training, validation, and testing sets is crucial for reliable performance. This tool takes your dataset (numerical data or chemical molecule data) and intelligently divides it, ensuring your model is evaluated fairly on both familiar and entirely new examples. It's for data scientists, cheminformatics researchers, and anyone creating or validating ML models who needs more robust data splitting than simple random methods.

Used by 2 other packages. No commits in the last 6 months. Available on PyPI.

Use this if you need to create more reliable and representative train, validation, and test splits for your machine learning models, especially to evaluate how well your model performs on new, unseen data or specific data characteristics.

Not ideal if you only need a basic random split and are not concerned with algorithmic sampling strategies or specific considerations for interpolative/extrapolative model performance.

machine-learning-validation cheminformatics materials-science chemical-kinetics data-splitting

Stale 6m

Maintenance 2 / 25

Adoption 11 / 25

Maturity 25 / 25

Community 9 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

skrub-data/skrub

Machine learning with dataframes

biolab/orange3

🍊 :bar_chart: :bulb: Orange: Interactive data analysis

root-project/root

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

cleanlab/cleanlab

Cleanlab's open-source library is the standard data-centric AI package for data quality and...

drivendataorg/deon

A command line tool to easily add an ethics checklist to your data science projects.

Explore ML Frameworks

All categories Trending ML Framework directory Insights