RedHatResearch/conext24-NetConfEval
Benchmark for evaluating LLMs in network configuration problems.
This project helps network operations engineers evaluate how well large language models can assist with network configuration tasks. It takes high-level network requirements and assesses a model's ability to translate them into formal specifications, API calls, routing algorithms, or low-level device configurations. The output shows which models are most effective for different stages of network setup and management.
No commits in the last 6 months.
Use this if you are a network operations engineer, researcher, or architect looking to understand the current capabilities and limitations of large language models for automating or facilitating network configuration workflows.
Not ideal if you are looking for a ready-to-deploy, production-grade tool to directly configure your network using AI, as this is an evaluation benchmark.
Stars
34
Forks
8
Language
Python
License
MIT
Category
Last pushed
Mar 30, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/RedHatResearch/conext24-NetConfEval"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
stanfordnlp/axbench
Stanford NLP Python library for benchmarking the utility of LLM interpretability methods
aidatatools/ollama-benchmark
LLM Benchmark for Throughput via Ollama (Local LLMs)
LarHope/ollama-benchmark
Ollama based Benchmark with detail I/O token per second. Python with Deepseek R1 example.
qcri/LLMeBench
Benchmarking Large Language Models
THUDM/LongBench
LongBench v2 and LongBench (ACL 25'&24')