aqlaboratory/proteinnet
Standardized data set for machine learning of protein structure
This project offers standardized protein sequences and structures (secondary and tertiary) along with multiple sequence alignments. It provides ready-to-use training, validation, and test datasets for machine learning research into protein structure prediction. Scientists and researchers in biochemistry or bioinformatics who are developing new computational methods for predicting protein shapes would use this.
910 stars. No commits in the last 6 months.
Use this if you are a researcher developing machine learning models for protein structure prediction and need a standardized, historically accurate dataset to benchmark your methods against established challenges.
Not ideal if you need access to the raw MSA data immediately for CASP 12 or if you are looking for a tool to perform protein structure prediction rather than a dataset to train models.
Stars
910
Forks
138
Language
Python
License
MIT
Category
Last pushed
Nov 18, 2020
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/aqlaboratory/proteinnet"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
DeepRank/deeprank2
An open-source deep learning framework for data mining of protein-protein interfaces or...
sacdallago/biotrainer
Biological prediction models made simple.
jonathanking/sidechainnet
An all-atom protein structure dataset for machine learning.
a-r-j/ProteinWorkshop
Benchmarking framework for protein representation learning. Includes a large number of...
songlab-cal/tape
Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised...