Missing Data Imputation ML Frameworks

Tools and frameworks for handling, imputing, and analyzing missing values in datasets across various modalities and domains. Does NOT include general data cleaning, time-series forecasting without imputation focus, or synthetic data generation unrelated to missingness mechanisms.

There are 28 missing data imputation frameworks tracked. 1 score above 70 (verified tier). The highest-rated is sktime/skpro at 71/100 with 314 stars.

Get all 28 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=missing-data-imputation&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Framework Score Tier
1 sktime/skpro

A unified framework for tabular probabilistic regression, time-to-event...

71
Verified
2 WenjieDu/Awesome_Imputation

Awesome Deep Learning for Time-Series Imputation, including an unmissable...

54
Established
3 WenjieDu/PyGrinder

PyGrinder: a Python toolkit for grinding data beans into the incomplete for...

51
Established
4 ocbe-uio/imml

A Python package for integrating, processing, and analyzing incomplete...

48
Emerging
5 DoubleML/doubleml-for-r

DoubleML - Double Machine Learning in R

48
Emerging
6 MIDASverse/rMIDAS

R package for missing-data imputation with deep learning

45
Emerging
7 SAP/knn-sampler

Machine learning imputation method to recover the distribution of missing...

39
Emerging
8 vanderschaarlab/hyperimpute

A framework for prototyping and benchmarking imputation methods

39
Emerging
9 aangelopoulos/ltt

Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control

39
Emerging
10 imputr/imputr

Python library for easy and fast ML-based & conventional imputation techniques.

36
Emerging
11 TyMill/SynthPred

A Julia package for synthetic data analysis, advanced imputation (ARIMA,...

36
Emerging
12 feruzoripov/tsgap

Time-series missingness simulation separating mechanisms (MCAR/MAR/MNAR)...

34
Emerging
13 thibaultcordier/risk-control

A toolkit to calibrate predictive algorithms to achieve risk control.

34
Emerging
14 haghish/mlim

mlim: single and multiple imputation with automated machine learning

33
Emerging
15 DoubleML/DoubleMLReplicationCode

Replication of Simulations in Bach et al. (2024) - DoubleML - An...

33
Emerging
16 blind-contours/CVtreeMLE

:deciduous_tree: :dart: Cross Validated Decision Trees with Targeted Maximum...

33
Emerging
17 miriamspsantos/heterogeneous-distance-functions

A collection of heterogeneous distance functions handling missing values.

25
Experimental
18 missValTeam/Iscores

Scoring rules for missing values imputations (Michel et al., 2021)

25
Experimental
19 AmirhosseinHonardoust/Missing-Data-Doctor

Missing Data Doctor is a diagnostic and treatment toolkit for missing values...

25
Experimental
20 liangyuanhu/Variable-selection-w-missing-data

A general variable selection approach in the presence of missing data in...

24
Experimental
21 jannebor/dd_forecast

Code for predicting probabilities of threat for Data Deficient species of...

24
Experimental
22 fchamroukhi/FLaMingos

Functional Latent datA Models for clusterING heterogeneOus curveS

22
Experimental
23 Akchaykumar2004/Missing-Data-Doctor

🩺 Diagnose and treat missing values in machine learning datasets with tools...

22
Experimental
24 miriamspsantos/synthetic-missing-data

A library for synthetic missing data generation.

19
Experimental
25 kennethleungty/DataWig-Missing-Data-Imputation

Imputation of Missing Data in Tables

19
Experimental
26 michelelagreca/Classification-On-Imputed-Data

Project of the 'Data and Information Quality' Course, aiming on describing...

17
Experimental
27 marcvidalbadia/functional-whitening

Online Material for Vidal and Aguilera (2022). Novel whitening approaches in...

17
Experimental
28 quansun98/MagicalRsq

MagicalRsq: Machine-learning-based genotype imputation quality calibration

12
Experimental