yigitkonur/cli-finetune-dataset

weighted category-balanced dataset builder for LLM fine-tuning

33
/ 100
Emerging

When fine-tuning a Large Language Model, you often have many conversation examples grouped into different categories, but you need a single, balanced dataset for training. This tool takes a directory of your categorized conversation files and combines them into a single, shuffled dataset where each category contributes a specific, weighted proportion. It's designed for machine learning engineers or researchers preparing custom datasets for LLM fine-tuning.

Use this if you need to create a finely balanced dataset for LLM fine-tuning from multiple JSONL files, ensuring specific categories are represented at controlled proportions.

Not ideal if your input data isn't in OpenAI chat-format JSONL files or you don't need to balance categories by weight.

LLM fine-tuning dataset preparation natural language processing machine learning engineering
No License No Package No Dependents
Maintenance 10 / 25
Adoption 6 / 25
Maturity 8 / 25
Community 9 / 25

How are scores calculated?

Stars

16

Forks

2

Language

Python

License

Last pushed

Feb 21, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/yigitkonur/cli-finetune-dataset"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.