ml-rust/splintr

A high-performance tokenizer (BPE + SentencePiece) built with Rust with Python bindings, focused on speed, safety, and resource optimization.

45
/ 100
Emerging

This tool helps AI engineers and machine learning practitioners quickly convert large volumes of text into tokens, and vice-versa. It takes raw text inputs like prompts, documents, or training data and outputs numerical tokens, which are essential for processing by large language models (LLMs). This is ideal for anyone working with LLMs who needs to prepare data efficiently or process model outputs in real-time.

Use this if you are an AI engineer or ML practitioner building LLM applications, training models, or processing large text datasets and need a significantly faster way to tokenize text than existing Python-based solutions.

Not ideal if you are working with very small, infrequent text inputs or if your current tokenization speed is not a bottleneck for your workflow.

LLM-development AI-engineering data-preprocessing natural-language-processing model-training
No Package No Dependents
Maintenance 10 / 25
Adoption 13 / 25
Maturity 13 / 25
Community 9 / 25

How are scores calculated?

Stars

57

Forks

5

Language

Python

License

MIT

Category

bpe-tokenizers

Last pushed

Mar 12, 2026

Monthly downloads

130

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/ml-rust/splintr"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.