ziliwang/gpt_tokenizer

cpp roberta tokenzier for deploy using

28
/ 100
Experimental

This tool helps developers integrate text processing capabilities directly into their C++ applications, especially for large-scale deployments. It takes raw text strings and converts them into numerical sequences (tokens and IDs) that can be fed into machine learning models, along with appropriate padding. This is ideal for C++ engineers building applications that need efficient, production-ready text tokenization.

No commits in the last 6 months.

Use this if you are a C++ developer needing to incorporate GPT-style text tokenization directly into your deployed applications for tasks like natural language processing.

Not ideal if you are a data scientist or researcher working primarily in Python, or if you need a tokenizer for models other than GPT/RoBERTa.

C++ development NLP deployment text processing machine learning engineering
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 8 / 25
Community 15 / 25

How are scores calculated?

Stars

10

Forks

4

Language

C++

License

Last pushed

Dec 03, 2020

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/ziliwang/gpt_tokenizer"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.