surrey-nlp/PLOD-AbbreviationDetection
This repository contains the PLOD Dataset for Abbreviation Detection released with our LREC 2022 publication
This dataset helps researchers and NLP practitioners automatically identify abbreviations and their full forms within scientific documents. It provides a large collection of text segments where abbreviations like "NLP" are linked to their expanded versions, such as "Natural Language Processing." Researchers building tools for tasks like information retrieval or machine translation would use this to improve their models' understanding of specialized texts.
No commits in the last 6 months.
Use this if you need a large, pre-annotated dataset to train or evaluate machine learning models for detecting abbreviations and their long forms in scientific text.
Not ideal if you need a plug-and-play tool for real-time abbreviation expansion, as this project provides a dataset for model training, not a ready-to-use application.
Stars
12
Forks
5
Language
Jupyter Notebook
License
CC-BY-SA-4.0
Category
Last pushed
Sep 25, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/surrey-nlp/PLOD-AbbreviationDetection"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
floriankark/cs224n-win2223
Code and written solutions of the assignments of the Stanford CS224N: Natural Language...
ThinamXx/Transformers_NLP
The repository will contain a list of projects which we will work on while reading the books of...
dipanjanS/adv_nlp_workshop_odsc_europe22
Extensive tutorials for the Advanced NLP Workshop in Open Data Science Conference Europe 2020....
ruanchaves/napolab
The Natural Portuguese Language Benchmark (Napolab). Stay up to date with the latest...
mantasu/cs224n
Solutions for CS224n (2022)