howl-anderson/MITIE_Chinese_Wikipedia_corpus
Pre-trained Wikipedia corpus by MITIE
This project offers a pre-trained language model specifically for Chinese text, built using a large dataset from Chinese Wikipedia. It takes raw Chinese text and provides a powerful 'understanding' of the words, which is crucial for building applications that need to process and interpret human language. This is ideal for developers creating natural language processing (NLP) solutions for Chinese speakers.
No commits in the last 6 months.
Use this if you are a developer building a Chinese natural language processing system and need a robust, pre-trained word representation model to save significant training time and computational resources.
Not ideal if you are an end-user looking for a ready-to-use application, as this project provides a technical component for developers rather than a direct user-facing tool.
Stars
51
Forks
9
Language
—
License
MIT
Category
Last pushed
Sep 09, 2018
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/howl-anderson/MITIE_Chinese_Wikipedia_corpus"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
NateScarlet/holiday-cn
📅🇨🇳中国法定节假日数据 自动每日抓取国务院公告
sagorbrur/bnlp
BNLP is a natural language processing toolkit for Bengali Language.
brightmart/nlp_chinese_corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
houbb/sensitive-word
👮♂️The sensitive word tool for java.(敏感词/违禁词/违法词/脏词。基于 DFA 算法实现的高性能 java...
esbatmop/MNBVC
MNBVC(Massive Never-ending BT Vast Chinese...