Tokenization Libraries LLM Tools

Libraries and tools for tokenizing text using OpenAI's tiktoken encoding across multiple programming languages and platforms. Does NOT include general text processing, language models themselves, or token estimation approximations without full tokenization.

There are 46 tokenization libraries tools tracked. 1 score above 50 (established tier). The highest-rated is aiqinxuancai/TiktokenSharp at 52/100 with 126 stars.

Get all 46 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=tokenization-libraries&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 aiqinxuancai/TiktokenSharp

Token calculation for OpenAI models, using `o200k_base` `cl100k_base`...

52
Established
2 pkoukk/tiktoken-go

go version of tiktoken

48
Emerging
3 dqbd/tiktokenizer

Online playground for OpenAPI tokenizers

48
Emerging
4 microsoft/Tokenizer

Typescript and .NET implementation of BPE tokenizer for OpenAI LLMs.

47
Emerging
5 lenML/tokenizers

a lightweight no-dependency fork from transformers.js (only tokenizers)

46
Emerging
6 tryAGI/Tiktoken

This project implements token calculation for OpenAI's gpt-4 and...

45
Emerging
7 geckse/n8n-nodes-gpt-tokenizer

n8n node for working with BPE Tokens with GPT in mind.

42
Emerging
8 MichaelCurrin/token-translator

Convert the token input limits of LLMs like ChatGPT into real-world measures...

41
Emerging
9 dmitry-brazhenko/SharpToken

SharpToken is a C# library for tokenizing natural language text. It's based...

41
Emerging
10 aallam/ktoken

Kotlin multiplatform BPE tokenizer library for OpenAI models

41
Emerging
11 AI21Labs/ai21-tokenizer

AI21's Jamba models tokenizers

40
Emerging
12 botisan-ai/gpt3-tokenizer

Isomorphic JavaScript/TypeScript Tokenizer for GPT-3 and Codex Models by OpenAI.

39
Emerging
13 zkry/tiktoken.el

tiktoken.el is an Emacs Lisp port of OpenAI's tiktoken library for BPE tokenization

36
Emerging
14 coder/ai-tokenizer

A faster than tiktoken tokenizer with first-class support for Vercel's AI SDK.

34
Emerging
15 oelmekki/tiktoken-cli

Simple wrapper around tiktoken to use it in your favorite language.

34
Emerging
16 samber/tiktoken-cli

🧮 CLI for counting tokens in files and directories using tiktoken

32
Emerging
17 Thibault00/runtoken

A blazing-fast BPE tokenizer for LLMs. Drop-in tiktoken replacement, 20-80x faster.

31
Emerging
18 Dev-in-a-Box-Limited/TokenEvaluator.Net

TokenEvaluator.Net is a simple and useful library designed to measure and...

30
Emerging
19 kgruiz/PyTokenCounter

A simple Python library for tokenizing text and counting tokens. While...

29
Experimental
20 peterheb/gotoken

Gotoken is a pure-Go implementation of the Python library openai/tiktoken.

29
Experimental
21 Darkatse/MikTik

A multi-model tokenizer, in Rust.

28
Experimental
22 unitythemaker/tokdu

tokdu (Token Disk Usage) is a terminal-based utility that helps you analyze...

28
Experimental
23 ziliwang/gpt_tokenizer

cpp roberta tokenzier for deploy using

28
Experimental
24 valmat/gpt-tokenator

GPT 3 tokens counter

28
Experimental
25 qbit-ai/tokenx-rs

Rust port of johannschopplich/tokenx - Fast token count estimation for LLMs...

27
Experimental
26 MrTechyWorker/chartokenizer

Chartokenizer is a Python package for basic character-level tokenization. It...

26
Experimental
27 AndresEspin1993/b2t-tokenizer

B2T - Tokenizer for the AI Systems.

24
Experimental
28 n4ryn/genai-tokenizer

GenAi Tokenizer is an interactive tokenizer playground to explore how text...

23
Experimental
29 claylo/ah-ah-ah

VUN token! TWO tokens! Count all the beautiful tokens ... offline! Ah-ah-ah!

22
Experimental
30 gemologic/carat

a quick cli tool to estimate token count

21
Experimental
31 rodneylab/tokenator

Count the number of tokens in an LLM prompt

21
Experimental
32 agentstation/tokenizer

High-performance tokenizer implementations in Go with unified CLI. Features...

21
Experimental
33 CTCycle/TKBEN-tokenizers-benchmarker

Explore and benchmark public and custom tokenizers from HuggingFace using...

21
Experimental
34 w95/tiktoken

The Tiktoken API is a tool that enables developers to calculate the token...

20
Experimental
35 JacobLinCool/Tiktoken-Calculator

Calculate the token count for GPT-4, GPT-3.5, GPT-3, and GPT-2.

20
Experimental
36 kgruiz/token-counter

Rust CLI for counting or tokenizing text, files, or directories with OpenAI...

17
Experimental
37 Marcelleedit7272/genai-tokenizer

🧠 Explore tokenization with GenAi-Tokenizer, a user-friendly tool for...

15
Experimental
38 feralghost/token-counter

Free API to count tokens for GPT-4, Claude, Gemini, and more. No API key...

14
Experimental
39 WinPooh32/tokc

Token counting utility

13
Experimental
40 fengkx/tu

A du-like CLI for counting tokens

13
Experimental
41 hardesttype/switch-tokenizer

A multilingual tokenization approach that maps different language tokenizers...

13
Experimental
42 teasec4/gpt_tokenizer

GPT Tokenizer

13
Experimental
43 XucroYuri/Tokenlink

A Token-Based Semantic Association Mining Tool

12
Experimental
44 kricsleo/gpt-token

Calculate the number of text tokens in GPT.

11
Experimental
45 MilanSuk/token_go

Simple & fast Encoder/Decoder for tiktoken vocabulary.

11
Experimental
46 cameronk/token-counter

Wraps @dqbd/tiktoken to count the number of tokens used by various OpenAI models.

10
Experimental