yueyu1030/AttrPrompt

[NeurIPS 2023] This is the code for the paper `Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias`.

/ 100

Emerging

This project helps machine learning practitioners generate high-quality training datasets for text classification tasks. It takes existing text data with labels and, using large language models, expands it into diverse, attributed training data. The output is a robust dataset ready for training classifiers, making it ideal for data scientists, ML engineers, or researchers building text-based AI models.

156 stars. No commits in the last 6 months.

Use this if you need to create diverse and richly attributed training datasets for text classification, especially when working with large language models to augment your data.

Not ideal if you are looking for a tool to train the classification models themselves, rather than generate the training data.

text-classification dataset-generation natural-language-processing machine-learning-engineering

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 12 / 25

How are scores calculated?

Stars

156

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

MadryLab/context-cite

Attribute (or cite) statements generated by LLMs back to in-context information.

microsoft/augmented-interpretable-models

Interpretable and efficient predictors using pre-trained language models. Scikit-learn compatible.

Trustworthy-ML-Lab/CB-LLMs

[ICLR 25] A novel framework for building intrinsically interpretable LLMs with...

poloclub/LLM-Attributor

LLM Attributor: Attribute LLM's Generated Text to Training Data

THUDM/LongCite

LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA

Explore Transformer Models

All categories Trending Transformer directory Insights