JuliaText/WordTokenizers.jl

High performance tokenizers for natural language processing and other related tasks

45
/ 100
Emerging

This project helps you break down raw text into meaningful units like individual words and sentences, which is the first step for any text analysis. It takes a block of text as input and outputs a structured list of words or sentences. Anyone working with text data for research, content analysis, or language processing would find this useful.

100 stars. No commits in the last 6 months.

Use this if you need to precisely segment text into words or sentences for further analysis, especially if you're working with diverse languages or specific text formats like social media posts.

Not ideal if you're looking for a complete natural language understanding solution, as this tool focuses solely on text segmentation and not on deeper linguistic analysis like part-of-speech tagging or sentiment analysis.

text-processing natural-language-analysis content-preparation language-research data-preprocessing
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 20 / 25

How are scores calculated?

Stars

100

Forks

25

Language

Julia

License

Last pushed

Dec 30, 2021

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/JuliaText/WordTokenizers.jl"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.