zjunlp/Mol-Instructions

[ICLR 2024] Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models

37
/ 100
Emerging

This project offers a specialized dataset for training large language models (LLMs) to understand and generate information about biomolecules. It takes natural language instructions or molecular structures as input and can output molecular descriptions, predicted reactions, or protein functions. Scientists, researchers, and engineers working in drug discovery, materials science, or biotechnology can use this to enhance AI systems for complex biomolecular tasks.

294 stars. No commits in the last 6 months.

Use this if you are developing or fine-tuning large language models specifically for tasks involving small molecules, proteins, or biomolecular text analysis.

Not ideal if your primary interest is in general-purpose language models not focused on chemistry or biology, or if you require an off-the-shelf application rather than a dataset for model training.

drug-discovery materials-science bioinformatics chemoinformatics protein-engineering
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 11 / 25

How are scores calculated?

Stars

294

Forks

15

Language

Python

License

MIT

Last pushed

Oct 28, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/zjunlp/Mol-Instructions"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.