ML4GLand/SeqData
Annotated sequence data
This project helps bioinformaticians and computational biologists prepare genomic sequence data for machine learning. It takes raw sequence data from common formats like FASTA, BigWig, and BAM files, then organizes it into a single, structured object ready for training models. Researchers working with genetic sequences for predictive analysis would use this.
No commits in the last 6 months.
Use this if you need to efficiently load and manage large genomic datasets, including sequences, coverage, and metadata, to train machine learning models.
Not ideal if you are not working with genomic sequence data or do not plan to use machine learning for your analysis.
Stars
11
Forks
—
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Feb 02, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/ML4GLand/SeqData"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
helicalAI/helical
A framework for state-of-the-art pre-trained bio foundation models on genomics and...
instadeepai/nucleotide-transformer
Foundation Models for Genomics & Transcriptomics
ML-Bioinfo-CEITEC/genomic_benchmarks
Benchmarks for classification of genomic sequences
FunctionLab/selene
a framework for training sequence-level deep learning networks
modernatx/seqlike
Unified biological sequence manipulation in Python