JacksonBurns/astartes
Better Data Splits for Machine Learning
When building machine learning models, accurately splitting your dataset into training, validation, and testing sets is crucial for reliable performance. This tool takes your dataset (numerical data or chemical molecule data) and intelligently divides it, ensuring your model is evaluated fairly on both familiar and entirely new examples. It's for data scientists, cheminformatics researchers, and anyone creating or validating ML models who needs more robust data splitting than simple random methods.
Used by 2 other packages. No commits in the last 6 months. Available on PyPI.
Use this if you need to create more reliable and representative train, validation, and test splits for your machine learning models, especially to evaluate how well your model performs on new, unseen data or specific data characteristics.
Not ideal if you only need a basic random split and are not concerned with algorithmic sampling strategies or specific considerations for interpolative/extrapolative model performance.
Stars
98
Forks
6
Language
Python
License
MIT
Category
Last pushed
Sep 30, 2025
Commits (30d)
0
Dependencies
5
Reverse dependents
2
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/JacksonBurns/astartes"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
skrub-data/skrub
Machine learning with dataframes
biolab/orange3
🍊 :bar_chart: :bulb: Orange: Interactive data analysis
root-project/root
The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
cleanlab/cleanlab
Cleanlab's open-source library is the standard data-centric AI package for data quality and...
drivendataorg/deon
A command line tool to easily add an ethics checklist to your data science projects.