zjunlp/Mol-Instructions

[ICLR 2024] Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models

/ 100

Emerging

This project offers a specialized dataset for training large language models (LLMs) to understand and generate information about biomolecules. It takes natural language instructions or molecular structures as input and can output molecular descriptions, predicted reactions, or protein functions. Scientists, researchers, and engineers working in drug discovery, materials science, or biotechnology can use this to enhance AI systems for complex biomolecular tasks.

294 stars. No commits in the last 6 months.

Use this if you are developing or fine-tuning large language models specifically for tasks involving small molecules, proteins, or biomolecular text analysis.

Not ideal if your primary interest is in general-purpose language models not focused on chemistry or biology, or if you require an off-the-shelf application rather than a dataset for model training.

drug-discovery materials-science bioinformatics chemoinformatics protein-engineering

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 11 / 25

How are scores calculated?

Stars

294

Forks

Language

Python

License

MIT

Higher-rated alternatives

agentscope-ai/Trinity-RFT

Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement...

OpenRLHF/OpenRLHF

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO &...

zjunlp/EasyEdit

[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.

huggingface/alignment-handbook

Robust recipes to align language models with human and AI preferences

hyunwoongko/nanoRLHF

nanoRLHF: from-scratch journey into how LLMs and RLHF really work.

Explore Transformer Models

All categories Trending Transformer directory Insights