rust-tokenizers and tokenizer

These are competitors offering overlapping functionality, as both implement BPE tokenization in Rust, though A provides a more comprehensive multi-algorithm tokenizer suite with significantly greater adoption and maintenance.

rust-tokenizers
61
Established
tokenizer
31
Emerging
Maintenance 10/25
Adoption 19/25
Maturity 16/25
Community 16/25
Maintenance 0/25
Adoption 3/25
Maturity 16/25
Community 12/25
Stars: 336
Forks: 33
Downloads: 8,112
Commits (30d): 0
Language: Rust
License: Apache-2.0
Stars: 3
Forks: 1
Downloads:
Commits (30d): 0
Language: Rust
License: MIT
No Package No Dependents
Stale 6m No Package No Dependents

About rust-tokenizers

guillaume-be/rust-tokenizers

Rust-tokenizer offers high-performance tokenizers for modern language models, including WordPiece, Byte-Pair Encoding (BPE) and Unigram (SentencePiece) models

This is a high-performance library that helps developers prepare text for use with large language models, such as BERT, GPT, and RoBERTa. It takes raw text input and converts it into numerical tokens, which are then fed into machine learning models. The primary users are developers building applications that process natural language, such as chatbots, sentiment analysis tools, or machine translation systems.

natural-language-processing machine-learning-engineering text-pre-processing AI-development computational-linguistics

About tokenizer

Usama3627/tokenizer

Implementation of BPE Tokenizer in Rust

Scores updated daily from GitHub, PyPI, and npm data. How scores work