tbepler/prose
Multi-task and masked language model-based protein sequence embedding models.
This project helps biological researchers and computational biologists analyze protein sequences by converting raw protein sequences into numerical representations called embeddings. You provide protein sequences, typically in FASTA format, and it outputs a file containing these embeddings, which can then be used for downstream computational tasks like predicting protein function or structure. It's designed for those who need to computationally process and understand large sets of protein data.
106 stars. No commits in the last 6 months.
Use this if you need to transform raw protein sequences into a numerical format suitable for machine learning or other computational analyses in biology.
Not ideal if you are looking for a tool to directly predict protein structures or functions without needing to work with numerical embeddings.
Stars
106
Forks
21
Language
Python
License
—
Category
Last pushed
Jun 16, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/tbepler/prose"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
BernhoferM/TMbed
Transmembrane proteins predicted through Language Model embeddings
sacdallago/bio_embeddings
Get protein embeddings from protein sequences
Rostlab/VESPA
VESPA is a simple, yet powerful Single Amino Acid Variant (SAV) effect predictor based on...
DeepRank/DeepRank-GNN-esm
Graph Network for protein-protein interface including language model features
bschilder/VEP_protein
Using Protein Language Models to compute Variant Effect Predictions across population-scale populations.