songlab-cal/tape

Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of protein biology.

50
/ 100
Established

This project helps biological researchers and biochemists analyze protein sequences by converting them into numerical representations called embeddings. You input raw protein sequences (like from a FASTA file), and it outputs these numerical embeddings, which can then be used for tasks like predicting protein structure, function, or evolutionary relationships. It's designed for scientists working with large datasets of protein sequences who need to apply machine learning methods to understand protein properties.

733 stars. No commits in the last 6 months.

Use this if you are a researcher or bioinformatician looking to generate robust, pre-trained numerical embeddings for protein sequences to facilitate downstream machine learning tasks in protein biology.

Not ideal if your primary goal is to reproduce the exact results from the original TAPE paper, as this updated codebase prioritizes ease of use and future development over strict reproducibility of past benchmarks.

protein-sequences bioinformatics protein-engineering computational-biology protein-function-prediction
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 24 / 25

How are scores calculated?

Stars

733

Forks

133

Language

Python

License

BSD-3-Clause

Last pushed

Dec 11, 2022

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/songlab-cal/tape"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.