tokenizers and gotokenizers

These are ecosystem siblings: the Go implementation provides language-specific bindings for tokenizer algorithms that are standardized and popularized by the Rust-based reference implementation, allowing Go developers to use the same tokenization logic in production environments.

tokenizers

Verified

gotokenizers

Emerging

Maintenance 20/25

Adoption 25/25

Maturity 25/25

Community 20/25

Maintenance 0/25

Adoption 8/25

Maturity 16/25

Community 11/25

Stars: 10,520

Forks: 1,051

Downloads: 1,504,044

Commits (30d): 45

Language: Rust

License: Apache-2.0

Stars: 44

Forks: 5

Downloads: —

Commits (30d): 0

Language: Go

License: BSD-2-Clause

No risk flags

Stale 6m No Package No Dependents

About tokenizers

huggingface/tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

When working with large volumes of text for natural language processing, this tool helps you convert raw text into a format that machine learning models can understand. It takes your raw text documents as input and produces a 'vocabulary' and 'tokens'—which are numerical representations of words or sub-word units. This is essential for AI researchers and machine learning engineers building or fine-tuning language models.

natural-language-processing machine-learning-engineering text-pre-processing AI-model-training

About gotokenizers

nlpodyssey/gotokenizers

Go implementation of today's most used tokenizers

This is a foundational tool for Go developers who are building applications that process human language. It takes raw text and converts it into numerical tokens, which are essential for feeding text into machine learning models for tasks like translation or sentiment analysis. The output is a structured sequence of tokens, ready for further natural language processing. This is for Go developers who need to integrate modern text processing capabilities directly into their Go-based systems.

Go development natural language processing text tokenization machine learning infrastructure AI application development

Related comparisons

tokenizers and tftokenizers tokenizers and libtokenizers tokenizers and language-tokenizer tokenizers and azerbaijani-tokenizer

Scores updated daily from GitHub, PyPI, and npm data. How scores work