ph-ausseil/llm-training-dataset-builder
Streamlines the creation of dataset to train a Large Language Model with triplets : instruction-input-output . The default configuration fits github.com/tloen/alpaca-lora requirements.
This tool helps data scientists and ML engineers create specialized training datasets for Large Language Models (LLMs). It takes raw business data from XML, JSON files, or PostgreSQL databases, like customer orders, and transforms it into structured 'instruction-input-output' question-answer pairs. These pairs are then used to fine-tune an LLM, teaching it about specific company data and processes.
No commits in the last 6 months.
Use this if you need to train a Large Language Model with your own specific business data, such as transaction records or operational logs, to make it knowledgeable about your unique company context.
Not ideal if you are looking for a pre-trained general-purpose LLM or a tool to simply deploy an existing LLM without custom training.
Stars
13
Forks
—
Language
Python
License
—
Category
Last pushed
Apr 17, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/ph-ausseil/llm-training-dataset-builder"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
OptimalScale/LMFlow
An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
adithya-s-k/AI-Engineering.academy
Mastering Applied AI, One Concept at a Time
jax-ml/jax-llm-examples
Minimal yet performant LLM examples in pure JAX
young-geng/scalax
A simple library for scaling up JAX programs
riyanshibohra/TuneKit
Upload your data → Get a fine-tuned SLM. Free.