jehumtine/synthetic_data_generator

This script is designed to convert bodies of text into a question and answer JSON format using the GPT-4 language model. The process involves extracting text from PDF files, tokenizing the text, generating questions and answers, and then saving the results in a JSON file.

27
/ 100
Experimental

This tool helps you quickly turn large PDF documents, like manuals or research papers, into structured question-and-answer pairs. It takes your PDF files as input and automatically generates relevant questions and their answers using an AI model, outputting them into a standard JSON file. This is useful for educators, trainers, or content creators who need to build knowledge bases or practice materials from existing textual content.

No commits in the last 6 months.

Use this if you need to rapidly create question-and-answer datasets from your PDF documents without manually drafting each question and answer.

Not ideal if you require highly nuanced or subjective Q&A pairs that need deep human understanding or specific domain expertise not easily captured by an AI.

content-creation education training-materials knowledge-management document-processing
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 8 / 25
Community 13 / 25

How are scores calculated?

Stars

24

Forks

4

Language

Python

License

Last pushed

Aug 22, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/jehumtine/synthetic_data_generator"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.