alibaba/CloudEval-YAML

☁️ Benchmarking LLMs for Cloud Config Generation ｜云场景下的大模型基准测试

/ 100

Experimental

This project helps cloud engineers and application developers evaluate how well different large language models (LLMs) can generate accurate YAML configurations for cloud-native applications like Kubernetes or Envoy. You provide an LLM, and it outputs performance scores like BLEU and exact match, showing how precisely the LLM generates the required YAML files. This is used by anyone responsible for deploying and managing applications in a cloud environment.

No commits in the last 6 months.

Use this if you need to objectively compare and select the best LLM or prompt engineering technique for automating cloud configuration tasks.

Not ideal if you are looking for a tool to deploy or manage cloud applications directly, rather than evaluate the LLMs that generate their configurations.

cloud-engineering DevOps configuration-management LLM-evaluation cloud-native

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 3 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

microsoft/private-benchmarking

A platform that enables users to perform private benchmarking of machine learning models. The...

Explore MLOps Tools

All categories Trending MLOps directory Insights