LLMeBench and llm-optimizer-benchmark
These two tools are **complements** because one focuses on benchmarking already trained Large Language Models (LLMs) across various tasks and capabilities, while the other specifically benchmarks optimizers used during the *pretraining* phase of LLMs, addressing different stages of the LLM lifecycle.
About LLMeBench
qcri/LLMeBench
Benchmarking Large Language Models
This framework helps you objectively compare how well different large language models (LLMs) perform on specific language tasks, regardless of their source (like OpenAI or HuggingFace). You provide a dataset and a task (such as sentiment analysis or question answering), and it outputs a detailed report on each model's accuracy and behavior. It's designed for AI researchers, data scientists, and language model evaluators who need to rigorously test and select the best LLM for their application.
About llm-optimizer-benchmark
epfml/llm-optimizer-benchmark
Benchmarking Optimizers for LLM Pretraining
This project offers a standardized way to compare different optimization techniques used in training Large Language Models (LLMs). It takes various optimizer configurations, model sizes, and training durations as input and produces benchmark results showing which optimizer performs best under specific conditions. LLM researchers and practitioners would use this to inform their choice of optimization methods for pretraining LLMs.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work