sbl-sdsc/mmtf-spark
Methods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.
This project helps structural biologists and biochemists efficiently analyze massive datasets of 3D protein structures, like the entire Protein Data Bank (PDB). It takes raw PDB files or MMTF-formatted structural data and allows for high-performance parallel processing to extract insights such as polypeptide chain statistics, structural alignments, or metadata. Researchers who need to perform large-scale computations on many protein structures will find this useful.
No commits in the last 6 months.
Use this if you need to perform complex queries or analyses on a very large collection of protein 3D structures and require the computational power of distributed processing.
Not ideal if you are working with individual protein structures or small datasets that do not require distributed computing resources.
Stars
21
Forks
31
Language
Java
License
Apache-2.0
Category
Last pushed
Feb 01, 2019
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/sbl-sdsc/mmtf-spark"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
lensacom/sparkit-learn
PySpark + Scikit-learn = Sparkit-learn
Angel-ML/angel
A Flexible and Powerful Parameter Server for large-scale machine learning
flink-extended/dl-on-flink
Deep Learning on Flink aims to integrate Flink and deep learning frameworks (e.g. TensorFlow,...
MingChen0919/learning-apache-spark
Notes on Apache Spark (pyspark)
mahmoudparsian/data-algorithms-book
MapReduce, Spark, Java, and Scala for Data Algorithms Book