HiThink-Research/BizFinBench
A Business-Driven Real-World Financial Benchmark for Evaluating LLMs
BizFinBench helps financial professionals assess how well large language models (LLMs) understand and reason about complex financial data and scenarios. It provides over 100,000 real-world financial questions in English and Chinese, spanning tasks like anomaly attribution, numerical computations, and stock price predictions. Financial analysts, quantitative researchers, or risk managers can use this to evaluate and compare LLMs for their specific financial applications.
211 stars.
Use this if you need to rigorously test and benchmark an LLM's capability to perform financial tasks, understand market trends, or process financial reports.
Not ideal if you are looking for a general-purpose LLM benchmark not specifically focused on the nuances and precision required for financial applications.
Stars
211
Forks
10
Language
Python
License
—
Category
Last pushed
Jan 09, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/HiThink-Research/BizFinBench"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
stanfordnlp/axbench
Stanford NLP Python library for benchmarking the utility of LLM interpretability methods
aidatatools/ollama-benchmark
LLM Benchmark for Throughput via Ollama (Local LLMs)
LarHope/ollama-benchmark
Ollama based Benchmark with detail I/O token per second. Python with Deepseek R1 example.
qcri/LLMeBench
Benchmarking Large Language Models
THUDM/LongBench
LongBench v2 and LongBench (ACL 25'&24')