bloomberg/MixCE-acl2023
Implementation of MixCE method described in ACL 2023 paper by Zhang et al.
This project offers an improved way to train large language models like GPT-2, helping them generate more natural and contextually relevant text. By adjusting how models learn from existing text data and evaluate their own predictions, it aims to produce higher quality language outputs. It's designed for researchers and practitioners who are building or fine-tuning advanced text generation systems and need to improve their performance.
No commits in the last 6 months.
Use this if you are a researcher or engineer working on autoregressive language models and want to enhance their training process for better text generation quality.
Not ideal if you are looking for a ready-to-use application or a simpler method for basic text generation tasks without deep involvement in model training specifics.
Stars
20
Forks
3
Language
Python
License
Apache-2.0
Category
Last pushed
May 29, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/bloomberg/MixCE-acl2023"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
kyegomez/LIMoE
Implementation of the "the first large-scale multimodal mixture of experts models." from the...
dohlee/chromoformer
The official code implementation for Chromoformer in PyTorch. (Lee et al., Nature Communications. 2022)
ahans30/goldfish-loss
[NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs
yinboc/trans-inr
Transformers as Meta-Learners for Implicit Neural Representations, in ECCV 2022
ibnaleem/mixtral.py
A Python module for running the Mixtral-8x7B language model with customisable precision and...