wszqkzqk/qt-web-extractor
Web content extraction engine backed by Qt WebEngine.
This tool helps people who need to gather up-to-date information from modern websites, especially those built with JavaScript frameworks, or extract content from PDF documents. It takes a web page URL or PDF document as input and provides clean, readable Markdown text or HTML, which is perfect for feeding into AI models or other data processing workflows. Anyone performing web research, content aggregation, or building AI agents that interact with web content would find this useful.
Use this if you need to reliably extract content from dynamic web pages that use JavaScript or require login, or if you need to pull text directly from PDF files for further analysis or AI processing.
Not ideal if you only need to fetch static HTML content without JavaScript rendering, or if you require full browser automation features like clicking buttons or filling forms.
Stars
11
Forks
—
Language
Python
License
GPL-3.0
Category
Last pushed
Mar 27, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/perception/wszqkzqk/qt-web-extractor"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
seleniumbase/SeleniumBase
APIs for browser automation, testing, and bypassing bot-detection.
apify/crawlee-python
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers....
intoli/user-agents
A JavaScript library for generating random user agents with data that's updated daily.
apify/crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In...
Kaliiiiiiiiii-Vinyzu/patchright
Undetected version of the Playwright testing and automation library.