DiscovAI/DiscovAI-crawl

🕷️ DiscovAI Crawl API(🚧 Work in Progress 🚧): A powerful web scraping solution for AI tools and vector databases. Extract clean HTML, generate LLM-friendly content, and create embeddings from any URL.

/ 100

Experimental

This tool helps AI developers and data engineers quickly gather and process web content for their AI applications. It takes any URL as input and produces clean, ad-free text, Markdown, key information, and even embeddings, ready for use in large language models or vector databases. This is ideal for teams building AI tools that need to ingest and understand web-based information.

No commits in the last 6 months.

Use this if you need to systematically scrape web pages, extract specific information, and prepare that content in an AI-ready format for your language models or vector databases.

Not ideal if you're looking for a simple, general-purpose web scraper for personal use or if your main goal is traditional data analysis rather than AI application development.

AI development data engineering web content processing LLM data preparation vector database integration

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 16 / 25

Community 5 / 25

How are scores calculated?

Stars

Forks

Language

TypeScript

License

Apache-2.0

Higher-rated alternatives

dsba6010-llm-applications/baemax_tc

LLM App to demystify and summarize Terms and Conditions agreements

brettlyy/text-to-sql

An application to write and run SQL queries, returning answers to natural language questions,...

bhattbhavesh91/pdf-qa-astradb-langchain

Explore how to build a Q&A system on PDF File's using AstraDB's Vector DB with Langchain and OpenAI API's

techdomegh/ai-news-scraper

AI News Scraper & Semantic Search: A Python application that scrapes news articles, uses GenAI...

lightfeed/sdk

Lightfeed SDK to search and filter web data

Explore Vector Databases

All categories Trending Vector Database directory Insights