HiThink-Research/BizFinBench

A Business-Driven Real-World Financial Benchmark for Evaluating LLMs

/ 100

Emerging

BizFinBench helps financial professionals assess how well large language models (LLMs) understand and reason about complex financial data and scenarios. It provides over 100,000 real-world financial questions in English and Chinese, spanning tasks like anomaly attribution, numerical computations, and stock price predictions. Financial analysts, quantitative researchers, or risk managers can use this to evaluate and compare LLMs for their specific financial applications.

211 stars.

Use this if you need to rigorously test and benchmark an LLM's capability to perform financial tasks, understand market trends, or process financial reports.

Not ideal if you are looking for a general-purpose LLM benchmark not specifically focused on the nuances and precision required for financial applications.

financial-analysis market-intelligence risk-management quantitative-finance financial-forecasting

No License No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 7 / 25

Community 9 / 25

How are scores calculated?

Stars

211

Forks

Language

Python

License

—

Higher-rated alternatives

stanfordnlp/axbench

Stanford NLP Python library for benchmarking the utility of LLM interpretability methods

aidatatools/ollama-benchmark

LLM Benchmark for Throughput via Ollama (Local LLMs)

LarHope/ollama-benchmark

Ollama based Benchmark with detail I/O token per second. Python with Deepseek R1 example.

qcri/LLMeBench

Benchmarking Large Language Models

THUDM/LongBench

LongBench v2 and LongBench (ACL 25'&24')

Explore Transformer Models

All categories Trending Transformer directory Insights