RedHatResearch/conext24-NetConfEval

Benchmark for evaluating LLMs in network configuration problems.

/ 100

Emerging

This project helps network operations engineers evaluate how well large language models can assist with network configuration tasks. It takes high-level network requirements and assesses a model's ability to translate them into formal specifications, API calls, routing algorithms, or low-level device configurations. The output shows which models are most effective for different stages of network setup and management.

No commits in the last 6 months.

Use this if you are a network operations engineer, researcher, or architect looking to understand the current capabilities and limitations of large language models for automating or facilitating network configuration workflows.

Not ideal if you are looking for a ready-to-deploy, production-grade tool to directly configure your network using AI, as this is an evaluation benchmark.

network-operations network-engineering network-automation traffic-engineering network-management

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 17 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

stanfordnlp/axbench

Stanford NLP Python library for benchmarking the utility of LLM interpretability methods

aidatatools/ollama-benchmark

LLM Benchmark for Throughput via Ollama (Local LLMs)

LarHope/ollama-benchmark

Ollama based Benchmark with detail I/O token per second. Python with Deepseek R1 example.

qcri/LLMeBench

Benchmarking Large Language Models

THUDM/LongBench

LongBench v2 and LongBench (ACL 25'&24')

Explore Transformer Models

All categories Trending Transformer directory Insights