jcblaisecruz02/Filipino-Text-Benchmarks
Open-source benchmark datasets and pretrained transformer models in the Filipino language.
This project offers essential resources for building applications that understand and process text in Filipino, a language often undersupported in AI. It provides pre-trained language models and benchmark datasets covering topics like fake news, hate speech, and health-related text in Filipino. Developers and researchers working on natural language processing (NLP) for Filipino can use these resources to create, evaluate, and improve their models for tasks like text classification and understanding.
No commits in the last 6 months.
Use this if you are an NLP developer or researcher building or evaluating AI models that need to accurately process and understand text written in Filipino.
Not ideal if you are looking for an out-of-the-box, end-user application or a actively maintained codebase, as the repository is no longer maintained.
Stars
64
Forks
9
Language
Python
License
GPL-3.0
Category
Last pushed
Aug 26, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/jcblaisecruz02/Filipino-Text-Benchmarks"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
acl-org/acl-anthology
Data and software for building the ACL Anthology.
anoopkunchukuttan/indic_nlp_library
Resources and tools for Indian language Natural Language Processing
CLUEbenchmark/CLUECorpus2020
Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
KennethEnevoldsen/scandinavian-embedding-benchmark
A Scandinavian Benchmark for sentence embeddings
Separius/awesome-sentence-embedding
A curated list of pretrained sentence and word embedding models