antoninichiq/QADatasetBuilder

Efficiently Transform PDFs and Wikipedia Pages into a Questions & Answers Dataset for Fine-Tuning.

21
/ 100
Experimental

This tool helps you quickly turn long, complex documents like PDFs or Wikipedia articles into structured question-and-answer pairs. It takes your raw text (from files or URLs) and outputs a neatly organized dataset of questions and answers. It's designed for anyone in data science or AI who needs to create specific Q&A datasets to train AI models.

No commits in the last 6 months.

Use this if you need to build custom question-answering datasets from existing documents to fine-tune a language model for a specific task or knowledge domain.

Not ideal if you're looking for a ready-to-use Q&A system for general knowledge or if you don't have a specific AI model training goal in mind.

AI-training dataset-generation document-processing NLP-engineering machine-learning-ops
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 16 / 25
Community 0 / 25

How are scores calculated?

Stars

9

Forks

Language

Python

License

MIT

Last pushed

Mar 08, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/antoninichiq/QADatasetBuilder"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.