amir9ume/urdu_ghazals_rekhta

Dataset for Urdu Ghazals

/ 100

Emerging

This project provides a collection of classical Urdu ghazals, a popular form of South Asian poetry, meticulously organized by author and available in Urdu, Hindi, and English transliteration. It's designed to offer text data for those working on natural language processing tasks, particularly for Urdu, which is considered a 'low-resource' language. Researchers and students in computational linguistics or digital humanities focusing on South Asian languages would find this useful.

No commits in the last 6 months.

Use this if you are a researcher or student in computational linguistics looking for a structured dataset of Urdu ghazals to analyze or experiment with language models.

Not ideal if you are trying to train a large-scale transformer model from scratch, as the dataset size is relatively small for such an intensive task.

Urdu-poetry South-Asian-literature computational-linguistics digital-humanities low-resource-languages

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 16 / 25

Community 17 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

MIT

Higher-rated alternatives

acl-org/acl-anthology

Data and software for building the ACL Anthology.

anoopkunchukuttan/indic_nlp_library

Resources and tools for Indian language Natural Language Processing

CLUEbenchmark/CLUECorpus2020

Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料

KennethEnevoldsen/scandinavian-embedding-benchmark

A Scandinavian Benchmark for sentence embeddings

Separius/awesome-sentence-embedding

A curated list of pretrained sentence and word embedding models

Explore NLP Tools

All categories Trending NLP directory Insights