Shengwei-Peng/TOCFL-MultiBench
TOCFL-MultiBench: A multimodal benchmark for evaluating Chinese language proficiency using text, audio, and visual data with deep learning. Features Selective Token Constraint Mechanism (STCM) for enhanced decoding stability.
This project helps evaluate Chinese language proficiency using multiple-choice questions that include text, audio, and visual information. It takes in various types of data related to Chinese language tests and outputs a comprehensive evaluation of proficiency, including accuracy and F1 scores. This is for researchers and educators developing or assessing AI models for language evaluation.
No commits in the last 6 months.
Use this if you are a researcher or educational technologist working on advanced AI systems for Chinese language assessment and need a robust benchmark.
Not ideal if you are a language learner looking for a direct study tool or a teacher wanting to grade student work manually.
Stars
8
Forks
—
Language
Python
License
Apache-2.0
Category
Last pushed
Dec 16, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Shengwei-Peng/TOCFL-MultiBench"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
stanfordnlp/axbench
Stanford NLP Python library for benchmarking the utility of LLM interpretability methods
aidatatools/ollama-benchmark
LLM Benchmark for Throughput via Ollama (Local LLMs)
LarHope/ollama-benchmark
Ollama based Benchmark with detail I/O token per second. Python with Deepseek R1 example.
qcri/LLMeBench
Benchmarking Large Language Models
THUDM/LongBench
LongBench v2 and LongBench (ACL 25'&24')