ComplexData-MILA/AIF-Gen
Generating Synthetic Lifelong RL Data for LLMs at Scale
This tool helps machine learning engineers and researchers generate synthetic preference data for training large language models (LLMs). You provide configuration files specifying the LLM's objective and desired preferences (e.g., explain like a 5-year-old vs. expert), and it outputs a dataset of prompts and AI-generated responses tailored to those preferences. This is useful for those who need to continually fine-tune LLMs in dynamic environments, like educational or customer service applications.
Use this if you need to rapidly create diverse, large-scale synthetic datasets of AI feedback to train your LLMs, especially in scenarios where preferences might evolve over time.
Not ideal if you primarily rely on human-generated feedback for your LLM training or if your data generation needs are small-scale and static.
Stars
14
Forks
1
Language
Python
License
MIT
Category
Last pushed
Feb 03, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/generative-ai/ComplexData-MILA/AIF-Gen"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
sdv-dev/SDV
Synthetic data generation for tabular data
sdv-dev/SDGym
Benchmarking synthetic data generation methods.
NVIDIA-NeMo/DataDesigner
🎨 NeMo Data Designer: A general library for generating high-quality synthetic data from scratch...
AlexanderVNikitin/tsgm
Generation and evaluation of synthetic time series datasets (also, augmentations,...
mostly-ai/mostlyai
Synthetic Data SDK ✨