bloomberg/MixCE-acl2023

Implementation of MixCE method described in ACL 2023 paper by Zhang et al.

/ 100

Emerging

This project offers an improved way to train large language models like GPT-2, helping them generate more natural and contextually relevant text. By adjusting how models learn from existing text data and evaluate their own predictions, it aims to produce higher quality language outputs. It's designed for researchers and practitioners who are building or fine-tuning advanced text generation systems and need to improve their performance.

No commits in the last 6 months.

Use this if you are a researcher or engineer working on autoregressive language models and want to enhance their training process for better text generation quality.

Not ideal if you are looking for a ready-to-use application or a simpler method for basic text generation tasks without deep involvement in model training specifics.

natural-language-generation language-model-training computational-linguistics text-synthesis AI-research

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 16 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

kyegomez/LIMoE

Implementation of the "the first large-scale multimodal mixture of experts models." from the...

dohlee/chromoformer

The official code implementation for Chromoformer in PyTorch. (Lee et al., Nature Communications. 2022)

ahans30/goldfish-loss

[NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs

yinboc/trans-inr

Transformers as Meta-Learners for Implicit Neural Representations, in ECCV 2022

ibnaleem/mixtral.py

A Python module for running the Mixtral-8x7B language model with customisable precision and...

Explore Transformer Models

All categories Trending Transformer directory Insights