sugarme/tokenizer

NLP tokenizers written in Go language

/ 100

Established

This tool helps Go language developers prepare text data for Natural Language Processing (NLP) models. It takes raw text input and breaks it down into individual words or sub-word units, along with their positions, making it ready for use in machine learning models. It's designed for Go developers who are building AI/deep-learning applications.

316 stars.

Use this if you are a Go developer building NLP applications and need to preprocess text by converting it into tokens for tasks like training or inference.

Not ideal if you are not a Go developer or if you need a pre-built NLP solution without needing to integrate a tokenizer into your Go application.

Go-programming NLP-development text-preprocessing AI-application-development machine-learning-engineering

No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 22 / 25

How are scores calculated?

Stars

316

Forks

Language

License

Apache-2.0

Compare

tokenizer and go-tokenizer

Related tools

guillaume-be/rust-tokenizers

Rust-tokenizer offers high-performance tokenizers for modern language models, including...

elixir-nx/tokenizers

Elixir bindings for 🤗 Tokenizers

openscilab/tocount

ToCount: Lightweight Token Estimator

reinfer/blingfire-rs

Rust wrapper for the BlingFire tokenization library

Scurrra/ubpe

Universal (general sequence) Byte-Pair Encoding

Explore NLP Tools

All categories Trending NLP directory Insights