sentencepiece and sentencepiece-jni

The JNI wrapper is a Java language binding that enables direct access to the core SentencePiece tokenizer library, making them complements designed to be used together rather than alternatives.

sentencepiece
78
Verified
sentencepiece-jni
41
Emerging
Maintenance 17/25
Adoption 15/25
Maturity 25/25
Community 21/25
Maintenance 0/25
Adoption 7/25
Maturity 16/25
Community 18/25
Stars: 11,697
Forks: 1,333
Downloads:
Commits (30d): 12
Language: C++
License: Apache-2.0
Stars: 38
Forks: 14
Downloads:
Commits (30d): 0
Language: C++
License: MIT
No risk flags
Stale 6m No Package No Dependents

About sentencepiece

google/sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

This tool helps machine learning engineers prepare raw text data for training neural network-based text generation models. It takes your raw text (like sentences or documents) and breaks it down into smaller, consistent pieces (subword units) suitable for fixed-vocabulary models. You can then feed these standardized units into your neural network, streamlining the text preparation pipeline for natural language processing tasks.

natural-language-processing machine-translation text-generation text-preparation neural-networks

About sentencepiece-jni

levyfan/sentencepiece-jni

Java JNI wrapper for SentencePiece: unsupervised text tokenizer for Neural Network-based text generation.

This project helps Java developers integrate SentencePiece, an unsupervised text tokenizer, into their applications. It takes raw text and converts it into numerical IDs or subword pieces, which are then used as input for neural network-based text generation models. AI/ML engineers and data scientists working with Java will find this useful for preparing text data.

natural-language-processing machine-learning-engineering text-generation java-development data-preprocessing

Scores updated daily from GitHub, PyPI, and npm data. How scores work