allenai/dont-stop-pretraining

Code associated with the Don't Stop Pretraining ACL 2020 paper

38
/ 100
Emerging

This project helps researchers and data scientists improve the performance of language models for specific applications. It provides pre-trained models and tools to adapt general-purpose language models (like RoBERTa) to specialized domains (e.g., biomedical, computer science, product reviews) or particular tasks (e.g., citation intent classification, chemical-protein relation extraction). You provide your domain-specific text data or task-specific labeled data, and the system outputs a finely tuned language model ready for better performance on your target application.

540 stars. No commits in the last 6 months.

Use this if you need to build highly accurate natural language processing (NLP) models for specialized text, such as scientific papers, legal documents, or customer reviews, where standard language models don't perform optimally.

Not ideal if you're looking for a simple, off-the-shelf NLP tool for general-purpose text analysis without any domain or task-specific adaptation needs.

Biomedical Research Computer Science Customer Reviews Analysis News Analysis Natural Language Processing
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 20 / 25

How are scores calculated?

Stars

540

Forks

73

Language

Python

License

Last pushed

Nov 15, 2021

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/allenai/dont-stop-pretraining"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.