allenai/dont-stop-pretraining
Code associated with the Don't Stop Pretraining ACL 2020 paper
This project helps researchers and data scientists improve the performance of language models for specific applications. It provides pre-trained models and tools to adapt general-purpose language models (like RoBERTa) to specialized domains (e.g., biomedical, computer science, product reviews) or particular tasks (e.g., citation intent classification, chemical-protein relation extraction). You provide your domain-specific text data or task-specific labeled data, and the system outputs a finely tuned language model ready for better performance on your target application.
540 stars. No commits in the last 6 months.
Use this if you need to build highly accurate natural language processing (NLP) models for specialized text, such as scientific papers, legal documents, or customer reviews, where standard language models don't perform optimally.
Not ideal if you're looking for a simple, off-the-shelf NLP tool for general-purpose text analysis without any domain or task-specific adaptation needs.
Stars
540
Forks
73
Language
Python
License
—
Category
Last pushed
Nov 15, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/allenai/dont-stop-pretraining"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
n-waves/multifit
The code to reproduce results from paper "MultiFiT: Efficient Multi-lingual Language Model...
princeton-nlp/SimCSE
[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
yxuansu/SimCTG
[NeurIPS'22 Spotlight] A Contrastive Framework for Neural Text Generation
alibaba-edu/simple-effective-text-matching
Source code of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".
Shark-NLP/OpenICL
OpenICL is an open-source framework to facilitate research, development, and prototyping of...