ziliwang/gpt_tokenizer

cpp roberta tokenzier for deploy using

/ 100

Experimental

This tool helps developers integrate text processing capabilities directly into their C++ applications, especially for large-scale deployments. It takes raw text strings and converts them into numerical sequences (tokens and IDs) that can be fed into machine learning models, along with appropriate padding. This is ideal for C++ engineers building applications that need efficient, production-ready text tokenization.

No commits in the last 6 months.

Use this if you are a C++ developer needing to incorporate GPT-style text tokenization directly into your deployed applications for tasks like natural language processing.

Not ideal if you are a data scientist or researcher working primarily in Python, or if you need a tokenizer for models other than GPT/RoBERTa.

C++ development NLP deployment text processing machine learning engineering

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 8 / 25

Community 15 / 25

How are scores calculated?

Stars

Forks

Language

C++

License

—

Higher-rated alternatives

aiqinxuancai/TiktokenSharp

Token calculation for OpenAI models, using `o200k_base` `cl100k_base` `p50k_base` encoding.

dqbd/tiktokenizer

Online playground for OpenAPI tokenizers

pkoukk/tiktoken-go

go version of tiktoken

microsoft/Tokenizer

Typescript and .NET implementation of BPE tokenizer for OpenAI LLMs.

lenML/tokenizers

a lightweight no-dependency fork from transformers.js (only tokenizers)

Explore LLM Tools

All categories Trending LLM Tool directory Insights