lenML/tokenizers

a lightweight no-dependency fork from transformers.js (only tokenizers)

/ 100

Emerging

This project helps developers integrate text processing capabilities from various large language models (LLMs) into their applications, especially when working offline. It takes raw text as input and breaks it down into individual tokens (words or sub-word units), which are essential for feeding text into LLMs. This is for developers building applications that need efficient, offline text tokenization without relying on external servers or heavy dependencies.

Available on npm.

Use this if you are a developer building an application that needs to tokenize text for various LLMs and requires offline functionality or a lightweight solution without full model dependencies.

Not ideal if you need to use the full ONNX models alongside the tokenizers, as this project focuses solely on tokenization without model inference.

AI-application-development NLP-implementation offline-AI text-preprocessing LLM-tooling

No Dependents

Maintenance 10 / 25

Adoption 7 / 25

Maturity 25 / 25

Community 4 / 25

How are scores calculated?

Stars

Forks

Language

JavaScript

License

MIT

Higher-rated alternatives

aiqinxuancai/TiktokenSharp

Token calculation for OpenAI models, using `o200k_base` `cl100k_base` `p50k_base` encoding.

dqbd/tiktokenizer

Online playground for OpenAPI tokenizers

pkoukk/tiktoken-go

go version of tiktoken

microsoft/Tokenizer

Typescript and .NET implementation of BPE tokenizer for OpenAI LLMs.

tryAGI/Tiktoken

This project implements token calculation for OpenAI's gpt-4 and gpt-3.5-turbo model,...

Explore LLM Tools

All categories Trending LLM Tool directory Insights