JacksonBurns/astartes

Better Data Splits for Machine Learning

47
/ 100
Emerging

When building machine learning models, accurately splitting your dataset into training, validation, and testing sets is crucial for reliable performance. This tool takes your dataset (numerical data or chemical molecule data) and intelligently divides it, ensuring your model is evaluated fairly on both familiar and entirely new examples. It's for data scientists, cheminformatics researchers, and anyone creating or validating ML models who needs more robust data splitting than simple random methods.

Used by 2 other packages. No commits in the last 6 months. Available on PyPI.

Use this if you need to create more reliable and representative train, validation, and test splits for your machine learning models, especially to evaluate how well your model performs on new, unseen data or specific data characteristics.

Not ideal if you only need a basic random split and are not concerned with algorithmic sampling strategies or specific considerations for interpolative/extrapolative model performance.

machine-learning-validation cheminformatics materials-science chemical-kinetics data-splitting
Stale 6m
Maintenance 2 / 25
Adoption 11 / 25
Maturity 25 / 25
Community 9 / 25

How are scores calculated?

Stars

98

Forks

6

Language

Python

License

MIT

Last pushed

Sep 30, 2025

Commits (30d)

0

Dependencies

5

Reverse dependents

2

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/JacksonBurns/astartes"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.