sanjanalreddy/NLP-Datasets

List of NLP Datasets

/ 100

Experimental

This is a curated list of datasets for individuals working on developing conversational AI or natural language processing systems. It provides access to various types of conversational data, ranging from simple single-turn dialogues to complex multi-turn interactions and customer support logs. Researchers and developers building chatbots, virtual assistants, or any system that processes human language will find this resource useful for training and evaluating their models.

No commits in the last 6 months.

Use this if you need pre-existing, labeled text data to train or test your natural language processing models, especially for tasks involving dialogue, conversation, or understanding customer interactions.

Not ideal if you are looking for a tool to process or analyze text, rather than a raw collection of datasets.

conversational-ai chatbot-development natural-language-understanding dialogue-system-training customer-support-automation

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 8 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

—

License

—

Higher-rated alternatives

acl-org/acl-anthology

Data and software for building the ACL Anthology.

anoopkunchukuttan/indic_nlp_library

Resources and tools for Indian language Natural Language Processing

CLUEbenchmark/CLUECorpus2020

Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料

KennethEnevoldsen/scandinavian-embedding-benchmark

A Scandinavian Benchmark for sentence embeddings

Separius/awesome-sentence-embedding

A curated list of pretrained sentence and word embedding models

Explore NLP Tools

All categories Trending NLP directory Insights