nalinrajendran/synthetic-LLM-QA-dataset-generator

Create synthetic datasets for training and testing Language Learning Models (LLMs) in a Question-Answering (QA) context.

/ 100

Emerging

This tool helps educators, trainers, or content creators quickly build question-and-answer sets from their PDF documents. You input one or more PDF files, and it uses a local language model to generate relevant questions and their corresponding answers. The output is a structured JSON file containing these QA pairs, useful for creating quizzes, study guides, or training materials.

No commits in the last 6 months.

Use this if you need to rapidly create question-answer datasets from your PDF-based content without manually writing each question and answer.

Not ideal if you require highly nuanced, domain-specific questions that necessitate deep human understanding or if you cannot run a local language model.

education content creation knowledge assessment training material development quiz generation

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 8 / 25

Maturity 8 / 25

Community 19 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Compare

synthetic-LLM-QA-dataset-generator and synthetic-dataset

Higher-rated alternatives

InternScience/GraphGen

GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation

timothepearce/synda

A CLI for generating synthetic data

rasinmuhammed/misata

High-performance open-source synthetic data engine. Uses LLMs for schema design and vectorized...

ziegler-ingo/CRAFT

[TACL, EMNLP 2025 Oral] Code, datasets, and checkpoints for the paper "CRAFT Your Dataset:...

ZhuLinsen/FastDatasets

A powerful tool for creating high-quality training datasets for Large Language Models...

Explore LLM Tools

All categories Trending LLM Tool directory Insights