msamogh/nonechucks

Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!

/ 100

Emerging

When working with large datasets for machine learning, you often encounter corrupted files or data that doesn't meet your criteria. This tool helps data scientists and ML engineers efficiently handle these 'bad' samples dynamically without crashing their pipelines. It takes your existing PyTorch datasets and automatically skips or filters out problematic entries, letting you continue training with valid data.

377 stars. No commits in the last 6 months. Available on PyPI.

Use this if you're a data scientist or ML engineer using PyTorch and frequently encounter corrupted files, unreadable images, or need to filter data based on content (like language) during the data loading process without pre-filtering the entire dataset.

Not ideal if your datasets are perfectly clean and don't require dynamic error handling or content-based filtering during loading.

data-preprocessing machine-learning-engineering dataset-management data-quality computer-vision

Stale 6m

Maintenance 0 / 25

Adoption 10 / 25

Maturity 25 / 25

Community 14 / 25

How are scores calculated?

Stars

377

Forks

Language

Python

License

MIT

Higher-rated alternatives

skrub-data/skrub

Machine learning with dataframes

biolab/orange3

🍊 :bar_chart: :bulb: Orange: Interactive data analysis

root-project/root

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

cleanlab/cleanlab

Cleanlab's open-source library is the standard data-centric AI package for data quality and...

drivendataorg/deon

A command line tool to easily add an ethics checklist to your data science projects.

Explore ML Frameworks

All categories Trending ML Framework directory Insights