msamogh/nonechucks

Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!

49
/ 100
Emerging

When working with large datasets for machine learning, you often encounter corrupted files or data that doesn't meet your criteria. This tool helps data scientists and ML engineers efficiently handle these 'bad' samples dynamically without crashing their pipelines. It takes your existing PyTorch datasets and automatically skips or filters out problematic entries, letting you continue training with valid data.

377 stars. No commits in the last 6 months. Available on PyPI.

Use this if you're a data scientist or ML engineer using PyTorch and frequently encounter corrupted files, unreadable images, or need to filter data based on content (like language) during the data loading process without pre-filtering the entire dataset.

Not ideal if your datasets are perfectly clean and don't require dynamic error handling or content-based filtering during loading.

data-preprocessing machine-learning-engineering dataset-management data-quality computer-vision
Stale 6m
Maintenance 0 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 14 / 25

How are scores calculated?

Stars

377

Forks

27

Language

Python

License

MIT

Last pushed

Sep 22, 2022

Commits (30d)

0

Dependencies

1

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/msamogh/nonechucks"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.