jcblaisecruz02/Filipino-Text-Benchmarks

Open-source benchmark datasets and pretrained transformer models in the Filipino language.

/ 100

Emerging

This project offers essential resources for building applications that understand and process text in Filipino, a language often undersupported in AI. It provides pre-trained language models and benchmark datasets covering topics like fake news, hate speech, and health-related text in Filipino. Developers and researchers working on natural language processing (NLP) for Filipino can use these resources to create, evaluate, and improve their models for tasks like text classification and understanding.

No commits in the last 6 months.

Use this if you are an NLP developer or researcher building or evaluating AI models that need to accurately process and understand text written in Filipino.

Not ideal if you are looking for an out-of-the-box, end-user application or a actively maintained codebase, as the repository is no longer maintained.

Filipino NLP natural language processing text classification low-resource languages AI model development

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 14 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

GPL-3.0

Higher-rated alternatives

acl-org/acl-anthology

Data and software for building the ACL Anthology.

anoopkunchukuttan/indic_nlp_library

Resources and tools for Indian language Natural Language Processing

CLUEbenchmark/CLUECorpus2020

Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料

KennethEnevoldsen/scandinavian-embedding-benchmark

A Scandinavian Benchmark for sentence embeddings

Separius/awesome-sentence-embedding

A curated list of pretrained sentence and word embedding models

Explore NLP Tools

All categories Trending NLP directory Insights