juletx/corpus-linguistics
Corpus Linguistics slides, labs, assignments and data
This course material helps linguists and language researchers learn how to analyze large collections of text, known as corpora. You'll input raw text data and learn methods to extract insights like common word pairings (collocations) or significant terms (keywords). It's designed for anyone studying language who wants to use computational methods to understand how language is used in real-world contexts.
No commits in the last 6 months.
Use this if you are a linguistics student, researcher, or language enthusiast looking to understand and apply computational techniques to analyze large text datasets.
Not ideal if you are looking for a plug-and-play software tool for corpus analysis without learning the underlying methods and theory.
Stars
7
Forks
—
Language
R
License
—
Category
Last pushed
Mar 13, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/juletx/corpus-linguistics"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Helsinki-NLP/OpusFilter
OpusFilter - Parallel corpus processing toolkit
natasha/corus
Links to Russian corpora + Python functions for loading and parsing
darija-open-dataset/dataset
darija <-> english dataset
omicsNLP/Auto-CORPus
Auto-CORPus pipeline developed by a University of Nottingham and Imperial College London...
SergeyShk/ruTS
Библиотека для извлечения статистик из текстов на русском языке.