bab2min/kiwi-farm

Kiwi 형태소 분석기를 활용한 딥러닝 언어 모델 실험실

/ 100

Experimental

For those working with Korean text in deep learning language models, this project offers an advanced 'tokenizer' built on the Kiwi morphological analyzer. It takes raw Korean text and converts it into a fixed set of vocabulary 'tokens' that deep learning models like BERT or GPT can understand, addressing the unique challenges of Korean language processing. This is ideal for AI/ML engineers, data scientists, or researchers who need to pre-process Korean text for natural language processing tasks.

No commits in the last 6 months.

Use this if you are developing or fine-tuning deep learning language models specifically for Korean and need a robust tokenizer that handles the language's complex morphology and common issues like misspellings or inconsistent spacing.

Not ideal if your primary focus is on languages other than Korean, or if you only need basic text splitting without advanced morphological analysis.

Korean NLP natural-language-processing deep-learning-language-models text-preprocessing machine-learning-research

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 8 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

—

Higher-rated alternatives

SKTBrain/KoBERT

Korean BERT pre-trained cased (KoBERT)

monologg/KoELECTRA

Pretrained ELECTRA Model for Korean

monologg/KoBERT-Transformers

KoBERT on 🤗 Huggingface Transformers 🤗 (with Bug Fixed)

VinAIResearch/PhoBERT

PhoBERT: Pre-trained language models for Vietnamese (EMNLP-2020 Findings)

KB-AI-Research/KB-ALBERT

KB국민은행에서 제공하는 경제/금융 도메인에 특화된 한국어 ALBERT 모델

Explore Transformer Models

All categories Trending Transformer directory Insights