ziliwang/gpt_tokenizer
cpp roberta tokenzier for deploy using
This tool helps developers integrate text processing capabilities directly into their C++ applications, especially for large-scale deployments. It takes raw text strings and converts them into numerical sequences (tokens and IDs) that can be fed into machine learning models, along with appropriate padding. This is ideal for C++ engineers building applications that need efficient, production-ready text tokenization.
No commits in the last 6 months.
Use this if you are a C++ developer needing to incorporate GPT-style text tokenization directly into your deployed applications for tasks like natural language processing.
Not ideal if you are a data scientist or researcher working primarily in Python, or if you need a tokenizer for models other than GPT/RoBERTa.
Stars
10
Forks
4
Language
C++
License
—
Category
Last pushed
Dec 03, 2020
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/ziliwang/gpt_tokenizer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
aiqinxuancai/TiktokenSharp
Token calculation for OpenAI models, using `o200k_base` `cl100k_base` `p50k_base` encoding.
dqbd/tiktokenizer
Online playground for OpenAPI tokenizers
pkoukk/tiktoken-go
go version of tiktoken
microsoft/Tokenizer
Typescript and .NET implementation of BPE tokenizer for OpenAI LLMs.
lenML/tokenizers
a lightweight no-dependency fork from transformers.js (only tokenizers)