ph-ausseil/llm-training-dataset-builder

Streamlines the creation of dataset to train a Large Language Model with triplets : instruction-input-output . The default configuration fits github.com/tloen/alpaca-lora requirements.

/ 100

Experimental

This tool helps data scientists and ML engineers create specialized training datasets for Large Language Models (LLMs). It takes raw business data from XML, JSON files, or PostgreSQL databases, like customer orders, and transforms it into structured 'instruction-input-output' question-answer pairs. These pairs are then used to fine-tune an LLM, teaching it about specific company data and processes.

No commits in the last 6 months.

Use this if you need to train a Large Language Model with your own specific business data, such as transaction records or operational logs, to make it knowledgeable about your unique company context.

Not ideal if you are looking for a pre-trained general-purpose LLM or a tool to simply deploy an existing LLM without custom training.

LLM-fine-tuning custom-AI-training business-data-integration question-answering-dataset machine-learning-engineering

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

—

Higher-rated alternatives

OptimalScale/LMFlow

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.

adithya-s-k/AI-Engineering.academy

Mastering Applied AI, One Concept at a Time

jax-ml/jax-llm-examples

Minimal yet performant LLM examples in pure JAX

young-geng/scalax

A simple library for scaling up JAX programs

riyanshibohra/TuneKit

Upload your data → Get a fine-tuned SLM. Free.

Explore Transformer Models

All categories Trending Transformer directory Insights