georg-jung/FastBertTokenizer

Fast and memory-efficient library for WordPiece tokenization as it is used by BERT.

47
/ 100
Emerging

This tool helps AI developers working with .NET process large amounts of text data efficiently for BERT models. It takes raw text as input and converts it into numerical tokens, along with attention masks and token type IDs, which are ready for machine learning models. The ideal user is a developer building AI applications or services in a .NET environment that rely on BERT's text processing capabilities.

Use this if you need to quickly and memory-efficiently prepare text for BERT models within a .NET application, especially when processing large datasets.

Not ideal if your AI application is not built on .NET or if you require tokenization support for two separate text inputs with a separator.

natural-language-processing machine-learning-engineering text-pre-processing dotnet-development ai-application-development
No Package No Dependents
Maintenance 6 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 17 / 25

How are scores calculated?

Stars

53

Forks

11

Language

C#

License

MIT

Category

bpe-tokenizers

Last pushed

Nov 16, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/georg-jung/FastBertTokenizer"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.