zjunlp/Mol-Instructions
[ICLR 2024] Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models
This project offers a specialized dataset for training large language models (LLMs) to understand and generate information about biomolecules. It takes natural language instructions or molecular structures as input and can output molecular descriptions, predicted reactions, or protein functions. Scientists, researchers, and engineers working in drug discovery, materials science, or biotechnology can use this to enhance AI systems for complex biomolecular tasks.
294 stars. No commits in the last 6 months.
Use this if you are developing or fine-tuning large language models specifically for tasks involving small molecules, proteins, or biomolecular text analysis.
Not ideal if your primary interest is in general-purpose language models not focused on chemistry or biology, or if you require an off-the-shelf application rather than a dataset for model training.
Stars
294
Forks
15
Language
Python
License
MIT
Category
Last pushed
Oct 28, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/zjunlp/Mol-Instructions"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
agentscope-ai/Trinity-RFT
Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement...
OpenRLHF/OpenRLHF
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO &...
zjunlp/EasyEdit
[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.
huggingface/alignment-handbook
Robust recipes to align language models with human and AI preferences
hyunwoongko/nanoRLHF
nanoRLHF: from-scratch journey into how LLMs and RLHF really work.