antoninichiq/QADatasetBuilder

Efficiently Transform PDFs and Wikipedia Pages into a Questions & Answers Dataset for Fine-Tuning.

/ 100

Experimental

This tool helps you quickly turn long, complex documents like PDFs or Wikipedia articles into structured question-and-answer pairs. It takes your raw text (from files or URLs) and outputs a neatly organized dataset of questions and answers. It's designed for anyone in data science or AI who needs to create specific Q&A datasets to train AI models.

No commits in the last 6 months.

Use this if you need to build custom question-answering datasets from existing documents to fine-tune a language model for a specific task or knowledge domain.

Not ideal if you're looking for a ready-to-use Q&A system for general knowledge or if you don't have a specific AI model training goal in mind.

AI-training dataset-generation document-processing NLP-engineering machine-learning-ops

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

MIT

Higher-rated alternatives

Pinafore/qb

QANTA Quiz Bowl AI

KristiyanVachev/Question-Generation

Generating multiple choice questions from text using Machine Learning.

wuba/qa_match

A simple effective ToolKit for short text matching

PolyAI-LDN/conversational-datasets

Large datasets for conversational AI

mcQA-suite/mcQA

🔮 Answering multiple choice questions with Language Models.

Explore ML Frameworks

All categories Trending ML Framework directory Insights