BobMcDear/minbpe-hs

Byte-level byte pair encoding (BPE) in Haskell

27
/ 100
Experimental

This project helps developers compress text data efficiently using Byte Pair Encoding (BPE). It takes a raw text corpus as input and outputs a set of merge rules and a vocabulary for tokenization. This allows other Haskell developers to integrate BPE into their applications for tasks like natural language processing, where text compression and tokenization are crucial.

No commits in the last 6 months.

Use this if you are a Haskell developer looking for a functional and performant implementation of byte-level Byte Pair Encoding for text tokenization and compression.

Not ideal if your input text contains non-ASCII characters and you need exact compatibility with Python's regex-based BPE tokenizers.

Haskell development text processing data compression natural language processing functional programming
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 16 / 25
Community 5 / 25

How are scores calculated?

Stars

17

Forks

1

Language

Haskell

License

MIT

Category

bpe-tokenizers

Last pushed

May 27, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/BobMcDear/minbpe-hs"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.