avkl/twinning
Data Twinning
Data Twinning helps you partition your datasets into statistically similar subsets, even when they are of different sizes. This is crucial for developing robust statistical and machine learning models, as it provides optimal training and testing sets. You provide a dataset (like a spreadsheet or database table) as input, and it returns which rows belong to each statistically matched subset. Data scientists, machine learning engineers, and statisticians who build and validate models will find this useful.
No commits in the last 6 months.
Use this if you need to reliably split a dataset into training and testing sets, or generate multiple folds for cross-validation, ensuring each subset maintains the statistical characteristics of the original data.
Not ideal if your dataset contains only categorical data that cannot be converted to numerical representations, or if you do not require statistically similar data partitions.
Stars
25
Forks
1
Language
C++
License
Apache-2.0
Category
Last pushed
Dec 21, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/avkl/twinning"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
davisking/dlib
A toolkit for making real world machine learning and data analysis applications in C++
ZigRazor/CXXGraph
Header-Only C++ Library for Graph Representation and Algorithms
apache/singa
a distributed deep learning platform
mlpack/mlpack
mlpack: a fast, header-only C++ machine learning library
hosseinmoein/DataFrame
C++ DataFrame for statistical, financial, and ML analysis in modern C++