strubell/preprocess-conll05

Scripts for preprocessing the CoNLL-2005 SRL dataset.

/ 100

Emerging

This helps computational linguists and NLP researchers prepare the CoNLL-2005 Semantic Role Labeling (SRL) dataset. It takes the raw Penn TreeBank and CoNLL-2005 data as input and produces structured text files with word forms, part-of-speech tags, gold syntax, and labeled semantic arguments, ready for training or evaluating SRL models.

No commits in the last 6 months.

Use this if you need to standardize and enrich the CoNLL-2005 dataset for semantic role labeling research, especially if you plan to convert constituency parses to dependency parses or use BIO format for span representation.

Not ideal if you are working with a different natural language processing task or dataset, as these scripts are specifically tailored for CoNLL-2005 SRL.

natural-language-processing semantic-role-labeling computational-linguistics text-annotation corpus-preparation

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 8 / 25

Community 16 / 25

How are scores calculated?

Stars

Forks

Language

Shell

License

—

Higher-rated alternatives

luheng/deep_srl

Code and pre-trained model for: Deep Semantic Role Labeling: What Works and What's Next

sileod/tasksource

Datasets collection and preprocessings framework for NLP extreme multitask learning

loomchild/maligna

Bilingual sengence aligner

CK-Explorer/DuoSubs

Semantic subtitle aligner and merger for bilingual subtitle syncing.

coastalcph/lex-glue

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

Explore NLP Tools

All categories Trending NLP directory Insights