tbepler/prose

Multi-task and masked language model-based protein sequence embedding models.

/ 100

Emerging

This project helps biological researchers and computational biologists analyze protein sequences by converting raw protein sequences into numerical representations called embeddings. You provide protein sequences, typically in FASTA format, and it outputs a file containing these embeddings, which can then be used for downstream computational tasks like predicting protein function or structure. It's designed for those who need to computationally process and understand large sets of protein data.

106 stars. No commits in the last 6 months.

Use this if you need to transform raw protein sequences into a numerical format suitable for machine learning or other computational analyses in biology.

Not ideal if you are looking for a tool to directly predict protein structures or functions without needing to work with numerical embeddings.

protein-science bioinformatics computational-biology protein-engineering structural-biology

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 19 / 25

How are scores calculated?

Stars

106

Forks

Language

Python

License

—

Higher-rated alternatives

BernhoferM/TMbed

Transmembrane proteins predicted through Language Model embeddings

sacdallago/bio_embeddings

Get protein embeddings from protein sequences

Rostlab/VESPA

VESPA is a simple, yet powerful Single Amino Acid Variant (SAV) effect predictor based on...

DeepRank/DeepRank-GNN-esm

Graph Network for protein-protein interface including language model features

bschilder/VEP_protein

Using Protein Language Models to compute Variant Effect Predictions across population-scale populations.

Explore Embedding Tools

All categories Trending Embeddings directory Insights