avkl/twinning

Data Twinning

27
/ 100
Experimental

Data Twinning helps you partition your datasets into statistically similar subsets, even when they are of different sizes. This is crucial for developing robust statistical and machine learning models, as it provides optimal training and testing sets. You provide a dataset (like a spreadsheet or database table) as input, and it returns which rows belong to each statistically matched subset. Data scientists, machine learning engineers, and statisticians who build and validate models will find this useful.

No commits in the last 6 months.

Use this if you need to reliably split a dataset into training and testing sets, or generate multiple folds for cross-validation, ensuring each subset maintains the statistical characteristics of the original data.

Not ideal if your dataset contains only categorical data that cannot be converted to numerical representations, or if you do not require statistically similar data partitions.

data-splitting machine-learning-model-validation statistical-modeling cross-validation dataset-compression
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 4 / 25

How are scores calculated?

Stars

25

Forks

1

Language

C++

License

Apache-2.0

Category

cpp-ml-libraries

Last pushed

Dec 21, 2022

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/avkl/twinning"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.