songlab-cal/tape
Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of protein biology.
This project helps biological researchers and biochemists analyze protein sequences by converting them into numerical representations called embeddings. You input raw protein sequences (like from a FASTA file), and it outputs these numerical embeddings, which can then be used for tasks like predicting protein structure, function, or evolutionary relationships. It's designed for scientists working with large datasets of protein sequences who need to apply machine learning methods to understand protein properties.
733 stars. No commits in the last 6 months.
Use this if you are a researcher or bioinformatician looking to generate robust, pre-trained numerical embeddings for protein sequences to facilitate downstream machine learning tasks in protein biology.
Not ideal if your primary goal is to reproduce the exact results from the original TAPE paper, as this updated codebase prioritizes ease of use and future development over strict reproducibility of past benchmarks.
Stars
733
Forks
133
Language
Python
License
BSD-3-Clause
Category
Last pushed
Dec 11, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/songlab-cal/tape"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
DeepRank/deeprank2
An open-source deep learning framework for data mining of protein-protein interfaces or...
sacdallago/biotrainer
Biological prediction models made simple.
jonathanking/sidechainnet
An all-atom protein structure dataset for machine learning.
BioinfoMachineLearning/DIPS-Plus
The Enhanced Database of Interacting Protein Structures for Interface Prediction
a-r-j/ProteinWorkshop
Benchmarking framework for protein representation learning. Includes a large number of...