codezakh/DataEnvGym
A testbed for agents and environments that can automatically improve models through data generation.
This project helps AI researchers and practitioners build and evaluate agents that can automatically generate new data to improve machine learning models. You provide a "student" model and a task (like solving math problems or generating code), and the system outputs a data generation agent that can create better training data to enhance the student model's performance. It's designed for those who develop and refine AI models, especially large language models, to achieve better results with less manual data curation.
No commits in the last 6 months.
Use this if you are an AI researcher or machine learning engineer looking to automate the process of creating high-quality training data to improve your models, particularly for multimodal, math, or code generation tasks.
Not ideal if you are a business user looking for a ready-to-use, off-the-shelf data generation solution without any programming or AI model development involvement.
Stars
28
Forks
6
Language
Python
License
MIT
Category
Last pushed
Mar 04, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/codezakh/DataEnvGym"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ai4co/reevo
[NeurIPS 2024] ReEvo: Large Language Models as Hyper-Heuristics with Reflective Evolution
SALT-NLP/collaborative-gym
Framework and toolkits for building and evaluating collaborative agents that can work together...
Gen-Verse/LatentMAS
Latent Collaboration in Multi-Agent Systems
lean-dojo/LeanCopilot
LLMs as Copilots for Theorem Proving in Lean
WooooDyy/AgentGym-RL
Code and implementations for the paper "AgentGym-RL: Training LLM Agents for Long-Horizon...