vakharwalad23/mark-minion
The Ultimate Web Content Extraction & Conversion Tool for AI/LLM Applications. Convert almost any web content into clean Markdown with intelligent AI processing.
Need to feed clean, structured content from various online sources into your AI models or applications? This tool takes almost any web content—like web pages, documents, videos, social media posts, and Google Docs—and converts it into clean Markdown or JSON format. It intelligently filters out clutter like ads, providing ready-to-use data for tasks like content analysis or training AI.
No commits in the last 6 months.
Use this if you are a data scientist, content strategist, or researcher who needs to systematically gather and clean diverse online content for AI model training, content analysis, or database population.
Not ideal if you need a real-time, high-volume scraping solution without any usage limits, as the free plan may encounter daily processing caps for browser-rendered sites.
Stars
12
Forks
1
Language
TypeScript
License
MIT
Category
Last pushed
Oct 08, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/perception/vakharwalad23/mark-minion"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
seleniumbase/SeleniumBase
APIs for browser automation, testing, and bypassing bot-detection.
apify/crawlee-python
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers....
intoli/user-agents
A JavaScript library for generating random user agents with data that's updated daily.
apify/crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In...
Kaliiiiiiiiii-Vinyzu/patchright
Undetected version of the Playwright testing and automation library.