zhanlaoban/NLP_PEMDC
NLP Predtrained Embeddings, Models and Datasets Collections(NLP_PEMDC). The collection will keep updating.
This is a continuously updated collection of pre-trained word embeddings, models, and datasets specifically designed for natural language processing (NLP) tasks, primarily focusing on Chinese. It provides ready-to-use components to help researchers and students explore and build various text-based applications, taking in raw Chinese (and some English) text data and supporting the creation of systems for tasks like classification, sentiment analysis, or question answering. This is for NLP researchers, data scientists, and students working on text analysis, especially with Chinese language data.
No commits in the last 6 months.
Use this if you need a convenient, centralized resource for Chinese NLP components, including word vectors, pre-trained language models like BERT and RoBERTa, and diverse datasets for tasks like sentiment analysis, text classification, and reading comprehension.
Not ideal if you are looking for a plug-and-play NLP application or a library with high-level APIs for immediate integration into production systems, as this is a collection of resources for learning and research.
Stars
65
Forks
15
Language
—
License
—
Category
Last pushed
Jan 14, 2020
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/zhanlaoban/NLP_PEMDC"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
acl-org/acl-anthology
Data and software for building the ACL Anthology.
anoopkunchukuttan/indic_nlp_library
Resources and tools for Indian language Natural Language Processing
CLUEbenchmark/CLUECorpus2020
Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
KennethEnevoldsen/scandinavian-embedding-benchmark
A Scandinavian Benchmark for sentence embeddings
Separius/awesome-sentence-embedding
A curated list of pretrained sentence and word embedding models