gidim/Babler
Data Collection System For NLP/Speech Recognition
Babler helps you gather relevant text conversations from Twitter, blogs, and forums in over 500 languages. You provide a list of keywords or topics you're interested in, and Babler automatically collects and cleans the corresponding posts. This is ideal for researchers, data scientists, or anyone needing real-world conversational data to train language models, perform sentiment analysis, or improve keyword search.
No commits in the last 6 months.
Use this if you need large amounts of clean, conversational text data for natural language processing or speech recognition tasks, especially in less common languages.
Not ideal if you need data from sources other than Twitter, blogs, or forums, or if you prefer a system with a graphical user interface.
Stars
25
Forks
12
Language
Java
License
Apache-2.0
Category
Last pushed
Apr 20, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/gidim/Babler"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
rosette-api/java
Babel Street Analytics Client Library for Java
kermitt2/entity-fishing
A machine learning tool for fishing entities
vinhkhuc/JFastText
Java interface for fastText
CeON/CERMINE
Content ExtRactor and MINEr
vinhkhuc/jcrfsuite
Java interface for CRFsuite: http://www.chokkan.org/software/crfsuite/