nlpodyssey/gotokenizers

Go implementation of today's most used tokenizers

/ 100

Emerging

This is a foundational tool for Go developers who are building applications that process human language. It takes raw text and converts it into numerical tokens, which are essential for feeding text into machine learning models for tasks like translation or sentiment analysis. The output is a structured sequence of tokens, ready for further natural language processing. This is for Go developers who need to integrate modern text processing capabilities directly into their Go-based systems.

No commits in the last 6 months.

Use this if you are a Go developer building an application that needs to break down natural language text into discrete tokens for machine learning or advanced text analysis, and you prefer a pure Go implementation.

Not ideal if you are looking for a high-performance library for production-ready NLP systems today, as this is an early-stage project focused on functionality parity rather than optimization.

Go development natural language processing text tokenization machine learning infrastructure AI application development

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 11 / 25

How are scores calculated?

Stars

Forks

Language

License

BSD-2-Clause

Featured in

We Audited crewAI's AI Dependencies: Here's What the Data Says

Compare

gotokenizers and tokenizers

Higher-rated alternatives

huggingface/tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

megagonlabs/ginza-transformers

Use custom tokenizers in spacy-transformers

Kaleidophon/token2index

A lightweight but powerful library to build token indices for NLP tasks, compatible with major...

Hugging-Face-Supporter/tftokenizers

Use Huggingface Transformer and Tokenizers as Tensorflow Reusable SavedModels

NVIDIA/Cosmos-Tokenizer

A suite of image and video neural tokenizers

Explore Transformer Models

All categories Trending Transformer directory Insights