nalinrajendran/synthetic-LLM-QA-dataset-generator
Create synthetic datasets for training and testing Language Learning Models (LLMs) in a Question-Answering (QA) context.
This tool helps educators, trainers, or content creators quickly build question-and-answer sets from their PDF documents. You input one or more PDF files, and it uses a local language model to generate relevant questions and their corresponding answers. The output is a structured JSON file containing these QA pairs, useful for creating quizzes, study guides, or training materials.
No commits in the last 6 months.
Use this if you need to rapidly create question-answer datasets from your PDF-based content without manually writing each question and answer.
Not ideal if you require highly nuanced, domain-specific questions that necessitate deep human understanding or if you cannot run a local language model.
Stars
54
Forks
18
Language
Python
License
—
Category
Last pushed
Jun 01, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/nalinrajendran/synthetic-LLM-QA-dataset-generator"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
InternScience/GraphGen
GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation
timothepearce/synda
A CLI for generating synthetic data
rasinmuhammed/misata
High-performance open-source synthetic data engine. Uses LLMs for schema design and vectorized...
ziegler-ingo/CRAFT
[TACL, EMNLP 2025 Oral] Code, datasets, and checkpoints for the paper "CRAFT Your Dataset:...
ZhuLinsen/FastDatasets
A powerful tool for creating high-quality training datasets for Large Language Models...