aviaefrat/cryptonite
The Official Repository of the Cryptonite Dataset
This dataset helps natural language processing (NLP) researchers evaluate how well their language models handle extreme linguistic ambiguity. It takes cryptic crossword clues as input and challenges models to find the correct answer, which often involves complex wordplay and hidden meanings. It's designed for NLP scientists and computational linguists pushing the boundaries of language understanding.
No commits in the last 6 months.
Use this if you are an NLP researcher developing or testing language models and need a robust benchmark for understanding highly ambiguous language.
Not ideal if you are looking for a dataset to solve standard crosswords or to train models on straightforward language understanding tasks.
Stars
23
Forks
2
Language
Python
License
—
Category
Last pushed
Feb 19, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/aviaefrat/cryptonite"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
acl-org/acl-anthology
Data and software for building the ACL Anthology.
anoopkunchukuttan/indic_nlp_library
Resources and tools for Indian language Natural Language Processing
CLUEbenchmark/CLUECorpus2020
Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
KennethEnevoldsen/scandinavian-embedding-benchmark
A Scandinavian Benchmark for sentence embeddings
Separius/awesome-sentence-embedding
A curated list of pretrained sentence and word embedding models