lenML/tokenizers
a lightweight no-dependency fork from transformers.js (only tokenizers)
This project helps developers integrate text processing capabilities from various large language models (LLMs) into their applications, especially when working offline. It takes raw text as input and breaks it down into individual tokens (words or sub-word units), which are essential for feeding text into LLMs. This is for developers building applications that need efficient, offline text tokenization without relying on external servers or heavy dependencies.
Available on npm.
Use this if you are a developer building an application that needs to tokenize text for various LLMs and requires offline functionality or a lightweight solution without full model dependencies.
Not ideal if you need to use the full ONNX models alongside the tokenizers, as this project focuses solely on tokenization without model inference.
Stars
32
Forks
1
Language
JavaScript
License
MIT
Category
Last pushed
Jan 21, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/lenML/tokenizers"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
aiqinxuancai/TiktokenSharp
Token calculation for OpenAI models, using `o200k_base` `cl100k_base` `p50k_base` encoding.
dqbd/tiktokenizer
Online playground for OpenAPI tokenizers
pkoukk/tiktoken-go
go version of tiktoken
microsoft/Tokenizer
Typescript and .NET implementation of BPE tokenizer for OpenAI LLMs.
tryAGI/Tiktoken
This project implements token calculation for OpenAI's gpt-4 and gpt-3.5-turbo model,...