avkl/twinning

Data Twinning

/ 100

Experimental

Data Twinning helps you partition your datasets into statistically similar subsets, even when they are of different sizes. This is crucial for developing robust statistical and machine learning models, as it provides optimal training and testing sets. You provide a dataset (like a spreadsheet or database table) as input, and it returns which rows belong to each statistically matched subset. Data scientists, machine learning engineers, and statisticians who build and validate models will find this useful.

No commits in the last 6 months.

Use this if you need to reliably split a dataset into training and testing sets, or generate multiple folds for cross-validation, ensuring each subset maintains the statistical characteristics of the original data.

Not ideal if your dataset contains only categorical data that cannot be converted to numerical representations, or if you do not require statistically similar data partitions.

data-splitting machine-learning-model-validation statistical-modeling cross-validation dataset-compression

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 4 / 25

How are scores calculated?

Stars

Forks

Language

C++

License

Apache-2.0

Higher-rated alternatives

davisking/dlib

A toolkit for making real world machine learning and data analysis applications in C++

ZigRazor/CXXGraph

Header-Only C++ Library for Graph Representation and Algorithms

apache/singa

a distributed deep learning platform

mlpack/mlpack

mlpack: a fast, header-only C++ machine learning library

hosseinmoein/DataFrame

C++ DataFrame for statistical, financial, and ML analysis in modern C++

Explore ML Frameworks

All categories Trending ML Framework directory Insights