mikemayuare/apetokenizer
Tokenizer for chemnical SMILES and SELFIES for use in transformers models.
This tool helps computational chemists and cheminformaticians prepare molecular structures for machine learning models. It takes chemical representations like SMILES or SELFIES as input and converts them into a sequence of 'tokens' that preserve chemical meaning. The output is a formatted input ready for use with popular machine learning models, enabling tasks like property prediction or drug discovery.
No commits in the last 6 months.
Use this if you need to transform chemical compound strings into a structured format that deep learning models can understand, while retaining critical chemical information.
Not ideal if you are a bench chemist looking for molecular drawing software or a tool for basic chemical reaction stoichiometry.
Stars
26
Forks
4
Language
Python
License
—
Category
Last pushed
Aug 22, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/mikemayuare/apetokenizer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
rxn4chemistry/rxn-onmt-models
Training of OpenNMT-based RXN models
CTCycle/ADSMOD-Adsorption-Modeling
Streamline adsorption modeling by automatically fitting theoretical adsorption models to...
sanjaradylov/smiles-gpt
Generative Pre-Training from Molecules
lamalab-org/MatText
Text-based modeling of materials.
VectorInstitute/atomgen
Library for handling atomistic graph datasets focusing on transformer-based implementations,...