amir9ume/urdu_ghazals_rekhta

Dataset for Urdu Ghazals

39
/ 100
Emerging

This project provides a collection of classical Urdu ghazals, a popular form of South Asian poetry, meticulously organized by author and available in Urdu, Hindi, and English transliteration. It's designed to offer text data for those working on natural language processing tasks, particularly for Urdu, which is considered a 'low-resource' language. Researchers and students in computational linguistics or digital humanities focusing on South Asian languages would find this useful.

No commits in the last 6 months.

Use this if you are a researcher or student in computational linguistics looking for a structured dataset of Urdu ghazals to analyze or experiment with language models.

Not ideal if you are trying to train a large-scale transformer model from scratch, as the dataset size is relatively small for such an intensive task.

Urdu-poetry South-Asian-literature computational-linguistics digital-humanities low-resource-languages
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 16 / 25
Community 17 / 25

How are scores calculated?

Stars

20

Forks

9

Language

Jupyter Notebook

License

MIT

Last pushed

Aug 14, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/amir9ume/urdu_ghazals_rekhta"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.