brandonleekramer/tidyorgs
A tidy package that detects and standardizes organizations in unstructured text data
This tool helps researchers, analysts, and policymakers categorize messy text data to identify and standardize organization names across different sectors like academia, business, government, and nonprofits. You provide unstructured text or email domains, and it returns standardized organization names and their sector classification. This is ideal for anyone needing to analyze affiliations from large datasets containing varied text entries.
No commits in the last 6 months.
Use this if you need to clean and categorize organization names from raw text fields or email addresses for social, economic, or policy analysis.
Not ideal if your data is already perfectly standardized or if you only need to extract organizations without categorizing them by sector.
Stars
7
Forks
1
Language
R
License
MIT
Category
Last pushed
Dec 13, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/brandonleekramer/tidyorgs"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
quanteda/quanteda
An R package for the Quantitative Analysis of Textual Data
juliasilge/tidytext
Text mining using tidy tools :sparkles::page_facing_up::sparkles:
massimoaria/tall
Text Analysis for aLL
keyATM/keyATM
An R package for Keyword Assisted Topic Models
gagolews/stringi
Fast and Portable Character String Processing in R (with the Unicode ICU)