bab2min/kiwi-farm

Kiwi 형태소 분석기를 활용한 딥러닝 언어 모델 실험실

16
/ 100
Experimental

For those working with Korean text in deep learning language models, this project offers an advanced 'tokenizer' built on the Kiwi morphological analyzer. It takes raw Korean text and converts it into a fixed set of vocabulary 'tokens' that deep learning models like BERT or GPT can understand, addressing the unique challenges of Korean language processing. This is ideal for AI/ML engineers, data scientists, or researchers who need to pre-process Korean text for natural language processing tasks.

No commits in the last 6 months.

Use this if you are developing or fine-tuning deep learning language models specifically for Korean and need a robust tokenizer that handles the language's complex morphology and common issues like misspellings or inconsistent spacing.

Not ideal if your primary focus is on languages other than Korean, or if you only need basic text splitting without advanced morphological analysis.

Korean NLP natural-language-processing deep-learning-language-models text-preprocessing machine-learning-research
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 8 / 25
Community 0 / 25

How are scores calculated?

Stars

52

Forks

Language

Python

License

Last pushed

May 17, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/bab2min/kiwi-farm"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.