CanCLID/sentences
粵語對話語料
This project helps people gather and clean Cantonese sentences for speech recognition. You provide raw Cantonese text, and it guides you to transform it into clean, standardized sentences suitable for training AI models. This is for language enthusiasts, researchers, and AI developers building Cantonese voice applications.
No commits in the last 6 months.
Use this if you need high-quality, standardized Cantonese sentence data to train speech recognition systems or other natural language processing tools.
Not ideal if you need a dataset that includes mixed English and Cantonese, numbers, abbreviations, or extensive punctuation.
Stars
29
Forks
3
Language
Jupyter Notebook
License
—
Category
Last pushed
May 12, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/CanCLID/sentences"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.