Shengwei-Peng/TOCFL-MultiBench

TOCFL-MultiBench: A multimodal benchmark for evaluating Chinese language proficiency using text, audio, and visual data with deep learning. Features Selective Token Constraint Mechanism (STCM) for enhanced decoding stability.

/ 100

Experimental

This project helps evaluate Chinese language proficiency using multiple-choice questions that include text, audio, and visual information. It takes in various types of data related to Chinese language tests and outputs a comprehensive evaluation of proficiency, including accuracy and F1 scores. This is for researchers and educators developing or assessing AI models for language evaluation.

No commits in the last 6 months.

Use this if you are a researcher or educational technologist working on advanced AI systems for Chinese language assessment and need a robust benchmark.

Not ideal if you are a language learner looking for a direct study tool or a teacher wanting to grade student work manually.

Chinese-language-assessment multimodal-AI language-proficiency-evaluation educational-technology AI-model-benchmarking

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 4 / 25

Maturity 16 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

Apache-2.0

Higher-rated alternatives

stanfordnlp/axbench

Stanford NLP Python library for benchmarking the utility of LLM interpretability methods

aidatatools/ollama-benchmark

LLM Benchmark for Throughput via Ollama (Local LLMs)

LarHope/ollama-benchmark

Ollama based Benchmark with detail I/O token per second. Python with Deepseek R1 example.

qcri/LLMeBench

Benchmarking Large Language Models

THUDM/LongBench

LongBench v2 and LongBench (ACL 25'&24')

Explore Transformer Models

All categories Trending Transformer directory Insights