rust-tokenizers and tokenizer

These are competitors offering overlapping functionality, as both implement BPE tokenization in Rust, though A provides a more comprehensive multi-algorithm tokenizer suite with significantly greater adoption and maintenance.

rust-tokenizers

Established

tokenizer

Emerging

Maintenance 10/25

Adoption 19/25

Maturity 16/25

Community 16/25

Maintenance 0/25

Adoption 3/25

Maturity 16/25

Community 12/25

Stars: 336

Forks: 33

Downloads: 8,112

Commits (30d): 0

Language: Rust

License: Apache-2.0

Stars: 3

Forks: 1

Downloads: —

Commits (30d): 0

Language: Rust

License: MIT

No Package No Dependents

Stale 6m No Package No Dependents

About rust-tokenizers

guillaume-be/rust-tokenizers

Rust-tokenizer offers high-performance tokenizers for modern language models, including WordPiece, Byte-Pair Encoding (BPE) and Unigram (SentencePiece) models

This is a high-performance library that helps developers prepare text for use with large language models, such as BERT, GPT, and RoBERTa. It takes raw text input and converts it into numerical tokens, which are then fed into machine learning models. The primary users are developers building applications that process natural language, such as chatbots, sentiment analysis tools, or machine translation systems.

natural-language-processing machine-learning-engineering text-pre-processing AI-development computational-linguistics

About tokenizer

Usama3627/tokenizer

Implementation of BPE Tokenizer in Rust

Scores updated daily from GitHub, PyPI, and npm data. How scores work