ph-ausseil/llm-training-dataset-builder

Streamlines the creation of dataset to train a Large Language Model with triplets : instruction-input-output . The default configuration fits github.com/tloen/alpaca-lora requirements.

21
/ 100
Experimental

This tool helps data scientists and ML engineers create specialized training datasets for Large Language Models (LLMs). It takes raw business data from XML, JSON files, or PostgreSQL databases, like customer orders, and transforms it into structured 'instruction-input-output' question-answer pairs. These pairs are then used to fine-tune an LLM, teaching it about specific company data and processes.

No commits in the last 6 months.

Use this if you need to train a Large Language Model with your own specific business data, such as transaction records or operational logs, to make it knowledgeable about your unique company context.

Not ideal if you are looking for a pre-trained general-purpose LLM or a tool to simply deploy an existing LLM without custom training.

LLM-fine-tuning custom-AI-training business-data-integration question-answering-dataset machine-learning-engineering
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 16 / 25
Community 0 / 25

How are scores calculated?

Stars

13

Forks

Language

Python

License

Category

llm-fine-tuning

Last pushed

Apr 17, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/ph-ausseil/llm-training-dataset-builder"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.