microsoft/LLF-Bench

A benchmark for evaluating learning agents based on just language feedback

45
/ 100
Emerging

This project provides a set of standardized interactive tasks designed to evaluate how well artificial intelligence agents learn from natural language feedback, rather than traditional numerical rewards or direct action demonstrations. It takes in an agent's actions and provides rich language descriptions of the environment and feedback on its progress. The output is a measure of the agent's performance in solving various tasks, making it useful for AI researchers and developers focused on building more human-like learning systems.

No commits in the last 6 months.

Use this if you are developing or evaluating AI agents that need to learn complex tasks by understanding and responding to human-like linguistic guidance and explanations.

Not ideal if your AI agent primarily learns through numerical rewards or by observing exact action sequences, without needing to process natural language feedback.

AI-evaluation interactive-learning language-understanding agent-development human-AI-interaction
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 18 / 25

How are scores calculated?

Stars

95

Forks

18

Language

Python

License

MIT

Last pushed

Jun 10, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/microsoft/LLF-Bench"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.