Tokenization Libraries LLM Tools
Libraries and tools for tokenizing text using OpenAI's tiktoken encoding across multiple programming languages and platforms. Does NOT include general text processing, language models themselves, or token estimation approximations without full tokenization.
There are 46 tokenization libraries tools tracked. 1 score above 50 (established tier). The highest-rated is aiqinxuancai/TiktokenSharp at 52/100 with 126 stars.
Get all 46 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=tokenization-libraries&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
aiqinxuancai/TiktokenSharp
Token calculation for OpenAI models, using `o200k_base` `cl100k_base`... |
|
Established |
| 2 |
pkoukk/tiktoken-go
go version of tiktoken |
|
Emerging |
| 3 |
dqbd/tiktokenizer
Online playground for OpenAPI tokenizers |
|
Emerging |
| 4 |
microsoft/Tokenizer
Typescript and .NET implementation of BPE tokenizer for OpenAI LLMs. |
|
Emerging |
| 5 |
lenML/tokenizers
a lightweight no-dependency fork from transformers.js (only tokenizers) |
|
Emerging |
| 6 |
tryAGI/Tiktoken
This project implements token calculation for OpenAI's gpt-4 and... |
|
Emerging |
| 7 |
geckse/n8n-nodes-gpt-tokenizer
n8n node for working with BPE Tokens with GPT in mind. |
|
Emerging |
| 8 |
MichaelCurrin/token-translator
Convert the token input limits of LLMs like ChatGPT into real-world measures... |
|
Emerging |
| 9 |
dmitry-brazhenko/SharpToken
SharpToken is a C# library for tokenizing natural language text. It's based... |
|
Emerging |
| 10 |
aallam/ktoken
Kotlin multiplatform BPE tokenizer library for OpenAI models |
|
Emerging |
| 11 |
AI21Labs/ai21-tokenizer
AI21's Jamba models tokenizers |
|
Emerging |
| 12 |
botisan-ai/gpt3-tokenizer
Isomorphic JavaScript/TypeScript Tokenizer for GPT-3 and Codex Models by OpenAI. |
|
Emerging |
| 13 |
zkry/tiktoken.el
tiktoken.el is an Emacs Lisp port of OpenAI's tiktoken library for BPE tokenization |
|
Emerging |
| 14 |
coder/ai-tokenizer
A faster than tiktoken tokenizer with first-class support for Vercel's AI SDK. |
|
Emerging |
| 15 |
oelmekki/tiktoken-cli
Simple wrapper around tiktoken to use it in your favorite language. |
|
Emerging |
| 16 |
samber/tiktoken-cli
🧮 CLI for counting tokens in files and directories using tiktoken |
|
Emerging |
| 17 |
Thibault00/runtoken
A blazing-fast BPE tokenizer for LLMs. Drop-in tiktoken replacement, 20-80x faster. |
|
Emerging |
| 18 |
Dev-in-a-Box-Limited/TokenEvaluator.Net
TokenEvaluator.Net is a simple and useful library designed to measure and... |
|
Emerging |
| 19 |
kgruiz/PyTokenCounter
A simple Python library for tokenizing text and counting tokens. While... |
|
Experimental |
| 20 |
peterheb/gotoken
Gotoken is a pure-Go implementation of the Python library openai/tiktoken. |
|
Experimental |
| 21 |
Darkatse/MikTik
A multi-model tokenizer, in Rust. |
|
Experimental |
| 22 |
unitythemaker/tokdu
tokdu (Token Disk Usage) is a terminal-based utility that helps you analyze... |
|
Experimental |
| 23 |
ziliwang/gpt_tokenizer
cpp roberta tokenzier for deploy using |
|
Experimental |
| 24 |
valmat/gpt-tokenator
GPT 3 tokens counter |
|
Experimental |
| 25 |
qbit-ai/tokenx-rs
Rust port of johannschopplich/tokenx - Fast token count estimation for LLMs... |
|
Experimental |
| 26 |
MrTechyWorker/chartokenizer
Chartokenizer is a Python package for basic character-level tokenization. It... |
|
Experimental |
| 27 |
AndresEspin1993/b2t-tokenizer
B2T - Tokenizer for the AI Systems. |
|
Experimental |
| 28 |
n4ryn/genai-tokenizer
GenAi Tokenizer is an interactive tokenizer playground to explore how text... |
|
Experimental |
| 29 |
claylo/ah-ah-ah
VUN token! TWO tokens! Count all the beautiful tokens ... offline! Ah-ah-ah! |
|
Experimental |
| 30 |
gemologic/carat
a quick cli tool to estimate token count |
|
Experimental |
| 31 |
rodneylab/tokenator
Count the number of tokens in an LLM prompt |
|
Experimental |
| 32 |
agentstation/tokenizer
High-performance tokenizer implementations in Go with unified CLI. Features... |
|
Experimental |
| 33 |
CTCycle/TKBEN-tokenizers-benchmarker
Explore and benchmark public and custom tokenizers from HuggingFace using... |
|
Experimental |
| 34 |
w95/tiktoken
The Tiktoken API is a tool that enables developers to calculate the token... |
|
Experimental |
| 35 |
JacobLinCool/Tiktoken-Calculator
Calculate the token count for GPT-4, GPT-3.5, GPT-3, and GPT-2. |
|
Experimental |
| 36 |
kgruiz/token-counter
Rust CLI for counting or tokenizing text, files, or directories with OpenAI... |
|
Experimental |
| 37 |
Marcelleedit7272/genai-tokenizer
🧠Explore tokenization with GenAi-Tokenizer, a user-friendly tool for... |
|
Experimental |
| 38 |
feralghost/token-counter
Free API to count tokens for GPT-4, Claude, Gemini, and more. No API key... |
|
Experimental |
| 39 |
WinPooh32/tokc
Token counting utility |
|
Experimental |
| 40 |
fengkx/tu
A du-like CLI for counting tokens |
|
Experimental |
| 41 |
hardesttype/switch-tokenizer
A multilingual tokenization approach that maps different language tokenizers... |
|
Experimental |
| 42 |
teasec4/gpt_tokenizer
GPT Tokenizer |
|
Experimental |
| 43 |
XucroYuri/Tokenlink
A Token-Based Semantic Association Mining Tool |
|
Experimental |
| 44 |
kricsleo/gpt-token
Calculate the number of text tokens in GPT. |
|
Experimental |
| 45 |
MilanSuk/token_go
Simple & fast Encoder/Decoder for tiktoken vocabulary. |
|
Experimental |
| 46 |
cameronk/token-counter
Wraps @dqbd/tiktoken to count the number of tokens used by various OpenAI models. |
|
Experimental |