strubell/preprocess-conll05
Scripts for preprocessing the CoNLL-2005 SRL dataset.
This helps computational linguists and NLP researchers prepare the CoNLL-2005 Semantic Role Labeling (SRL) dataset. It takes the raw Penn TreeBank and CoNLL-2005 data as input and produces structured text files with word forms, part-of-speech tags, gold syntax, and labeled semantic arguments, ready for training or evaluating SRL models.
No commits in the last 6 months.
Use this if you need to standardize and enrich the CoNLL-2005 dataset for semantic role labeling research, especially if you plan to convert constituency parses to dependency parses or use BIO format for span representation.
Not ideal if you are working with a different natural language processing task or dataset, as these scripts are specifically tailored for CoNLL-2005 SRL.
Stars
24
Forks
6
Language
Shell
License
—
Category
Last pushed
Mar 28, 2019
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/strubell/preprocess-conll05"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
luheng/deep_srl
Code and pre-trained model for: Deep Semantic Role Labeling: What Works and What's Next
sileod/tasksource
Datasets collection and preprocessings framework for NLP extreme multitask learning
loomchild/maligna
Bilingual sengence aligner
CK-Explorer/DuoSubs
Semantic subtitle aligner and merger for bilingual subtitle syncing.
coastalcph/lex-glue
LexGLUE: A Benchmark Dataset for Legal Language Understanding in English