DiscovAI/DiscovAI-crawl
🕷️ DiscovAI Crawl API(🚧 Work in Progress 🚧): A powerful web scraping solution for AI tools and vector databases. Extract clean HTML, generate LLM-friendly content, and create embeddings from any URL.
This tool helps AI developers and data engineers quickly gather and process web content for their AI applications. It takes any URL as input and produces clean, ad-free text, Markdown, key information, and even embeddings, ready for use in large language models or vector databases. This is ideal for teams building AI tools that need to ingest and understand web-based information.
No commits in the last 6 months.
Use this if you need to systematically scrape web pages, extract specific information, and prepare that content in an AI-ready format for your language models or vector databases.
Not ideal if you're looking for a simple, general-purpose web scraper for personal use or if your main goal is traditional data analysis rather than AI application development.
Stars
19
Forks
1
Language
TypeScript
License
Apache-2.0
Category
Last pushed
Aug 05, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/vector-db/DiscovAI/DiscovAI-crawl"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
dsba6010-llm-applications/baemax_tc
LLM App to demystify and summarize Terms and Conditions agreements
brettlyy/text-to-sql
An application to write and run SQL queries, returning answers to natural language questions,...
bhattbhavesh91/pdf-qa-astradb-langchain
Explore how to build a Q&A system on PDF File's using AstraDB's Vector DB with Langchain and OpenAI API's
techdomegh/ai-news-scraper
AI News Scraper & Semantic Search: A Python application that scrapes news articles, uses GenAI...
lightfeed/sdk
Lightfeed SDK to search and filter web data