AdaptInfer/CompBioDatasetsForMachineLearning
A Curated List of Computational Biology Datasets Suitable for Machine Learning
This is a curated collection of computational biology datasets specifically prepared for machine learning tasks. It helps researchers and data scientists in fields like genomics, proteomics, and medical imaging find suitable input data, such as genetic sequences, protein expression levels, medical images, or electronic health records, to build predictive models and analyze biological systems. It helps users avoid extensive data pre-processing by offering ready-to-use datasets.
197 stars. No commits in the last 6 months.
Use this if you are a computational biologist, bioinformatician, or data scientist looking for pre-processed, high-quality computational biology datasets to train machine learning models for tasks like disease prediction, drug discovery, or genomic analysis.
Not ideal if you need raw, unprocessed biological data for custom analysis or if your work does not involve machine learning applications.
Stars
197
Forks
26
Language
—
License
—
Category
Last pushed
Apr 19, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/AdaptInfer/CompBioDatasetsForMachineLearning"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
open-edge-platform/datumaro
Dataset Management Framework, a Python library and a CLI tool to build, analyze and manage...
explosion/ml-datasets
🌊 Machine learning dataset loaders for testing and example scripts
webdataset/webdataset
A high-performance Python-based I/O system for large (and small) deep learning problems, with...
tensorflow/datasets
TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
mlcommons/croissant
Croissant is a high-level format for machine learning datasets that brings together four rich layers.