bab2min/kiwi-farm
Kiwi 형태소 분석기를 활용한 딥러닝 언어 모델 실험실
For those working with Korean text in deep learning language models, this project offers an advanced 'tokenizer' built on the Kiwi morphological analyzer. It takes raw Korean text and converts it into a fixed set of vocabulary 'tokens' that deep learning models like BERT or GPT can understand, addressing the unique challenges of Korean language processing. This is ideal for AI/ML engineers, data scientists, or researchers who need to pre-process Korean text for natural language processing tasks.
No commits in the last 6 months.
Use this if you are developing or fine-tuning deep learning language models specifically for Korean and need a robust tokenizer that handles the language's complex morphology and common issues like misspellings or inconsistent spacing.
Not ideal if your primary focus is on languages other than Korean, or if you only need basic text splitting without advanced morphological analysis.
Stars
52
Forks
—
Language
Python
License
—
Category
Last pushed
May 17, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/bab2min/kiwi-farm"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
SKTBrain/KoBERT
Korean BERT pre-trained cased (KoBERT)
monologg/KoELECTRA
Pretrained ELECTRA Model for Korean
monologg/KoBERT-Transformers
KoBERT on 🤗 Huggingface Transformers 🤗 (with Bug Fixed)
VinAIResearch/PhoBERT
PhoBERT: Pre-trained language models for Vietnamese (EMNLP-2020 Findings)
KB-AI-Research/KB-ALBERT
KB국민은행에서 제공하는 경제/금융 도메인에 특화된 한국어 ALBERT 모델