Web Scraping NLP Pipelines NLP Tools
End-to-end systems that combine web scraping with NLP analysis (sentiment, readability, topic modeling, entity extraction) on text extracted from websites, articles, or online sources. Does NOT include standalone scraping tools, NLP libraries, or applications that only perform analysis without web data extraction.
There are 96 web scraping nlp pipelines tools tracked. 1 score above 70 (verified tier). The highest-rated is flairNLP/fundus at 72/100 with 443 stars.
Get all 96 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=web-scraping-nlp-pipelines&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
flairNLP/fundus
A very simple news crawler with a funny name |
|
Verified |
| 2 |
fhamborg/news-please
news-please - an integrated web crawler and information extractor for news... |
|
Established |
| 3 |
affjljoo3581/canrevan
대량의 네이버 뉴스 기사를 수집하는 라이브러리입니다. |
|
Established |
| 4 |
FreeDiscovery/FreeDiscovery
Web Service for E-Discovery Analytics |
|
Established |
| 5 |
tirthajyoti/Web-Database-Analytics
Web scrapping and related analytics using Python tools |
|
Established |
| 6 |
Multiverse-of-Projects/NewsAI
A dynamic NewsAI dashboard that uses NLP to analyze news articles, visualize... |
|
Emerging |
| 7 |
rajaswa/DRIFT
DRIFT is a tool for Diachronic Analysis of Scientific Literature. |
|
Emerging |
| 8 |
smyja/blackmaria
Python package for webscraping in Natural language |
|
Emerging |
| 9 |
MasuRii/FBScrapeIdeas
Modern CLI tool for scraping & analyzing Facebook groups using Playwright &... |
|
Emerging |
| 10 |
kevalmorabia97/SEDTWik-Event-Detection-from-Tweets
Segmentation based event detection from Tweets. Published at NAACL SRW 2019 |
|
Emerging |
| 11 |
uhh-lt/newsleak
Information extraction and interactive visualization of textual datasets for... |
|
Emerging |
| 12 |
vipul-sharma20/sharingan
Tool to extract news articles from newspaper and give the context about the news |
|
Emerging |
| 13 |
sandeep-sandhu/NewsLookout
The NewsLookout web scraping application with NLP and data pre-processing |
|
Emerging |
| 14 |
FinnishCancerRegistry/gleason_extraction_py
Extract Gleason scores from texts. |
|
Emerging |
| 15 |
uscensusbureau/SABLE
Scraping Assisted by Learning |
|
Emerging |
| 16 |
ahmedbesbes/How-to-mine-newsfeed-data-and-extract-interactive-insights-in-Python
A practical guide to topic mining and interactive visualizations |
|
Emerging |
| 17 |
Sotera/watchman
Watchman: An open-source social-media event-detection system |
|
Emerging |
| 18 |
nawaz-kmr/Data_Extraction_and_Text_Analysis_for_Blackcoffer_company.
The objective of this assignment is to extract textual data articles from... |
|
Emerging |
| 19 |
VIDA-NYU/domain_discovery_API
Domain Discovery Operations API formalizes the human domain discovery... |
|
Emerging |
| 20 |
scrapegoat/scrapegoat
Scrape Data in One-shot. |
|
Emerging |
| 21 |
Just-Helpful/preventable-deaths-scraper
Web scraper, written for the Preventable Deaths website, with emphasis on... |
|
Emerging |
| 22 |
nakuleshj/news-nlp-pipeline
A fully serverless, event-driven data pipeline that ingests, enriches,... |
|
Emerging |
| 23 |
Jasiri-App/datagpu
DataGPU is an open-source data compiler for AI pipelines that helps you... |
|
Emerging |
| 24 |
networkdynamics/seldonite
A News Article Collection Library |
|
Emerging |
| 25 |
victoria217-bottino/google-news-scraper
# 📰 Google News Scraper A Python tool to fetch, decode, and process... |
|
Emerging |
| 26 |
lkstrp/newspaper-scraper
The all-in-one Python package for seamless newspaper article indexing,... |
|
Emerging |
| 27 |
nostoz/news_monitor
Real time news monitor aggregating from various sources based on keywords |
|
Emerging |
| 28 |
gangula-karthik/KAKI-App
A web app uniting everyone for big wins and a greener Singapore! 🚀🌳 |
|
Emerging |
| 29 |
ZIADEA/SmartWebScraper-CV
SmartWebScraper-CV – AI-Powered Web Page Zone Detection SmartWebScraper-CV... |
|
Emerging |
| 30 |
SakuraPuare/ZhiHu_Spider
知乎内容爬虫 | Web scraper for Zhihu content extraction |
|
Emerging |
| 31 |
ntddk/peeling-onions
A repository to store Deep Web (onion domain) crawler, scraper, and NLP... |
|
Emerging |
| 32 |
BioinfoNet/Data-mining
Data mining to discover trends in Open Science in Kenya |
|
Emerging |
| 33 |
jasp9559/Web-Scraping-of-Indian-Judgements
Web scraping project for scraping the latest/most recent judgement taken on the day |
|
Emerging |
| 34 |
antoninfaure/rssTrends
Finding Topics in French News using RSS Feeds |
|
Emerging |
| 35 |
sodalabsio/event-detection-extraction
Repository for QA-based event detection and extraction from news and social media. |
|
Experimental |
| 36 |
susannapaoli/web-scraper-nyt
New York Times Scraper |
|
Experimental |
| 37 |
aybarskerem/WebScraper
This repo contains Various WebScrapers for different sites and process the... |
|
Experimental |
| 38 |
GateNLP/wpextract
Create datasets from WordPress sites for research or archiving |
|
Experimental |
| 39 |
bhx98/NameAnalysis
Choosing a company name by analyzing the most used keywords in the field and... |
|
Experimental |
| 40 |
jpwahle/cs-insights-crawler
This repository implements the interaction with DBLP, information extraction... |
|
Experimental |
| 41 |
dobbersc/fundus-evaluation
[ACL 2024] Evaluation of the Fundus News Scraper |
|
Experimental |
| 42 |
Atharv279/Task-Extraction-NLP
NLP-based Task Extraction & Categorization | This project extracts tasks... |
|
Experimental |
| 43 |
Awakumori/NGAspider
NGA论坛(艾泽拉斯国家地理)爬虫工具。采用多线程采集,MongoDB存储,集成PaddlePaddle进行NLP。整合百度解语进行实体识别,更新NLP情... |
|
Experimental |
| 44 |
WISETICT-PPAM/Data-Analytics
제품 정보 크롤링 및 리뷰 텍스트 마이닝 |
|
Experimental |
| 45 |
agi-templar/MediaCloudDataDownloader
Download full-length articles from media outlets. |
|
Experimental |
| 46 |
balaurian/fx_news_scraper
A scraper for investing.com forex news using beautifulsoup and nltk. It also... |
|
Experimental |
| 47 |
dukeblue1994-glitch/chronicle
Intelligent event detection system using semantic embeddings, MinHash LSH... |
|
Experimental |
| 48 |
someoneorlov/styx
ML News Analysis Service |
|
Experimental |
| 49 |
AmmarRashed/EventOrient
A web-based application for monitoring, analyzing and visualizing social... |
|
Experimental |
| 50 |
Kamomille/WebScrapping_Supermarket
Analyse des coûts des supermarchés |
|
Experimental |
| 51 |
zer0Percent/OhWowBREAKINGNews
A multithreaded scraper to retrieve and parse new's articles. |
|
Experimental |
| 52 |
samuelhatcliff/newstracker
News Tracker is an application designed to enhance and optimize the way that... |
|
Experimental |
| 53 |
nivaangupta/news-website
A news website that provides summarised news on trending topics, popular... |
|
Experimental |
| 54 |
stkisengese/news-intelligence-nlp-platform
A Python-based NLP platform for scraping, analyzing, and enriching news... |
|
Experimental |
| 55 |
georgiarichards/preventabledeathstracker
Code for running the Preventable Deaths Tracker website |
|
Experimental |
| 56 |
MANISH007700/NewsArticleExtraction
Extraction of News Article from different News Web Pages using feedparser... |
|
Experimental |
| 57 |
asaifuddin18/Search-Engine-Data-Collector
Summer '21 research project under Forward Data Lab group. Django website... |
|
Experimental |
| 58 |
stuartemiddleton/floraguard_crawler
FloraGuard crawler for online forums and marketplaces around the illegal... |
|
Experimental |
| 59 |
moehmeni/ezweb
Easy to use web page analyzer |
|
Experimental |
| 60 |
umutkavakli/sikayetvar-scraping
A scraping tool for customer complaints of specified brands to use in NLP tasks. |
|
Experimental |
| 61 |
satyampandey1411/SAT-News-Analyser
SAT News Analyser is a web application offering in-depth news article... |
|
Experimental |
| 62 |
b-i-king/Top_News_Twitter_Bot_Template
Twitter Bot Template |
|
Experimental |
| 63 |
javiermascarena/footy-narratives
Automated weekly storylines and topic summaries for the “Big Six” English... |
|
Experimental |
| 64 |
Anonym0usWork1221/python-code-docstring-scraper
A multi-threaded GitHub scraper to collect Python code with docstrings from... |
|
Experimental |
| 65 |
utkarsh512/CreateDebateScraper
Scraping debates from the CreateDebate forum |
|
Experimental |
| 66 |
Biswas-N/Norman-PD-incidents-extractor
Python based utility to create Norman Police Department's incident dataset... |
|
Experimental |
| 67 |
LiliValGo/NLP-for-IPCC-Climate-Reports
This project combines web scraping, PDF processing, and Natural Language... |
|
Experimental |
| 68 |
ArpitaChatterjee/Routine-Analysis-of-a-Comedian
Build a dataset using the transcript for the 10 popular comedians, using web... |
|
Experimental |
| 69 |
Onaga08/scrape-and-sense
A comprehensive script for web scraping and NLP analysis, providing detailed... |
|
Experimental |
| 70 |
doinakis/Real-Time-News-Assistant
Real Time News Asstistant for Greek news. |
|
Experimental |
| 71 |
eyereece/nlp-text-mining-dashboard
nlp text mining dashboard to explore current trends and extract most used... |
|
Experimental |
| 72 |
J-TECH-bot/Blackcoffer_Data_Extraction_NLP
This repository showcases data-driven text analytics using NLP techniques.... |
|
Experimental |
| 73 |
estefaniagPerez/net-analyzer-sna-nlp-analysis
This project (ReactJS and Python) combines Social Network Analysis (SNA) and... |
|
Experimental |
| 74 |
DolbyUUU/event-timeline-generation-olympics
A toy system for generating event timelines from social media data,... |
|
Experimental |
| 75 |
SaltyGod/Text-Data-Mining
一个标准的文本爬取、进行深度挖掘分析的全流程项目 |
|
Experimental |
| 76 |
adityamangal1/Web-Scraping
web data extraction |
|
Experimental |
| 77 |
AtulJoshi1/ProductDescription2Keywords
Extracting Search Engine Appropriate Keywords and Key Selling Points from a... |
|
Experimental |
| 78 |
nikitaprasad21/Data-Extraction-and-NLP
Performed Data Extraction and NLP Analysis |
|
Experimental |
| 79 |
IshtyM/Data-Extraction-and-Text-Analytics
Text Analysis that includes extraction of word count, Positive Score,... |
|
Experimental |
| 80 |
ElfatihZiad/BBCNews-scraper-nlp
A data pipeline to extract News articles from BBC News, storing it to... |
|
Experimental |
| 81 |
vansh-py04/Data-Extraction-and-Text-Analysis
The objective of this assignment is to extract textual data articles from... |
|
Experimental |
| 82 |
pranjal-pravesh/web-article-analyzer
A comprehensive text analysis system that performs web scraping, sentiment... |
|
Experimental |
| 83 |
QuhiQuhihi/news_analysis
crawling news data and extract keywords from article |
|
Experimental |
| 84 |
DRSarcenoR/fetchNews
Aplicación en Streamlit que dado el prompt (se espera un nombre), muestre... |
|
Experimental |
| 85 |
Mreeb/TOpic_name_eXtraction
Department of Justice 2009-2018 Press Releases Data and reading Analysing... |
|
Experimental |
| 86 |
kshitijbhandari/Web-Scraping-and-text-analysis
NLP pipeline to scrape 114 articles using BeautifulSoup and compute 13... |
|
Experimental |
| 87 |
Haimonmon/snippy
A Book scraping bot that ables to give you books data, but be cautious as... |
|
Experimental |
| 88 |
AnFrBo/internet_censorship
Analysis of the State of Internet Censorship in the United Kingdom Using... |
|
Experimental |
| 89 |
crackalamoo/web-nlp-scraper
A command line tool to quickly run natural language processing (NLP)... |
|
Experimental |
| 90 |
yashvardhanv/Atomic-news3.0
Upgraded version of AtomicNews2.0 with login/signup features. |
|
Experimental |
| 91 |
rogerchang1108/Cambridge-Dictionary-Web-Scraper
In this project, we employ the BeautifulSoup4 package in Python Jupyter... |
|
Experimental |
| 92 |
tasozgurcem11/eksi-analysis
Collect and analyze eksi forum public entries |
|
Experimental |
| 93 |
mccormd1/RandM_Transcript_Sentiment_Analysis
Various html scraping and NLP techniques applied to Rick & Morty transcripts. |
|
Experimental |
| 94 |
liuzl/newsmth
A go crawler for newsmth.net |
|
Experimental |
| 95 |
solinode/narratix
tuned to the noise before it becomes signal. |
|
Experimental |
| 96 |
krishgoyal0/BookMyShow_event_scrapper_automation
This is a project made for automating data scrapping from a particular Event... |
|
Experimental |