scDataset/scDataset
scDataset: Scalable Data Loading for Deep Learning on Large-Scale Single-Cell Omics
This tool helps single-cell omics researchers efficiently load and process massive datasets for deep learning. You provide your single-cell data, such as gene expression or protein measurements, in formats like AnnData or NumPy arrays. The tool then outputs well-structured batches of this data, ready for training deep learning models. It's designed for scientists and computational biologists working with very large single-cell genomics or proteomics datasets.
Available on PyPI.
Use this if you are a single-cell biologist or computational scientist training deep learning models on single-cell omics datasets that are too large to fit into memory.
Not ideal if your datasets are small enough to be loaded entirely into memory, or if you are not using deep learning for single-cell omics analysis.
Stars
43
Forks
2
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Jan 30, 2026
Commits (30d)
0
Dependencies
2
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/scDataset/scDataset"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
scverse/scanpy
Single-cell analysis in Python. Scales to >100M cells.
scverse/scvi-tools
Deep probabilistic analysis of single-cell and spatial omics data
Teichlab/celltypist
A tool for semi-automatic cell type classification
theislab/scarches
Reference mapping for single-cell genomics
Teichlab/cellhint
A tool for semi-automatic cell type harmonization and integration