nlpodyssey/gotokenizers
Go implementation of today's most used tokenizers
This is a foundational tool for Go developers who are building applications that process human language. It takes raw text and converts it into numerical tokens, which are essential for feeding text into machine learning models for tasks like translation or sentiment analysis. The output is a structured sequence of tokens, ready for further natural language processing. This is for Go developers who need to integrate modern text processing capabilities directly into their Go-based systems.
No commits in the last 6 months.
Use this if you are a Go developer building an application that needs to break down natural language text into discrete tokens for machine learning or advanced text analysis, and you prefer a pure Go implementation.
Not ideal if you are looking for a high-performance library for production-ready NLP systems today, as this is an early-stage project focused on functionality parity rather than optimization.
Stars
44
Forks
5
Language
Go
License
BSD-2-Clause
Category
Last pushed
Dec 12, 2020
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/nlpodyssey/gotokenizers"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Higher-rated alternatives
huggingface/tokenizers
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
megagonlabs/ginza-transformers
Use custom tokenizers in spacy-transformers
Kaleidophon/token2index
A lightweight but powerful library to build token indices for NLP tasks, compatible with major...
Hugging-Face-Supporter/tftokenizers
Use Huggingface Transformer and Tokenizers as Tensorflow Reusable SavedModels
NVIDIA/Cosmos-Tokenizer
A suite of image and video neural tokenizers