LEL-A/GerAlpacaDataCleaned

German Alpaca Dataset (Cleaned + Translated)

27
/ 100
Experimental

This project provides a ready-to-use collection of German language instructions, inputs, and desired outputs for training language models. It takes an existing English-language dataset of conversational prompts and translates them into German, while also cleaning up inconsistencies and excessively long responses. The ideal user is an NLP researcher or machine learning engineer working on fine-tuning large language models for German-speaking applications.

No commits in the last 6 months.

Use this if you need a high-quality, pre-translated, and cleaned dataset of German instruction-following examples to train or fine-tune your NLP models.

Not ideal if you are looking for raw, untranslated English data or if your project requires a dataset specifically designed for a different language.

Natural Language Processing German Language AI Large Language Model Training Instruction Tuning Machine Learning Datasets
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 4 / 25

How are scores calculated?

Stars

26

Forks

1

Language

Jupyter Notebook

License

MIT

Last pushed

Apr 06, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/LEL-A/GerAlpacaDataCleaned"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.