DiscovAI/DiscovAI-crawl

🕷️ DiscovAI Crawl API(🚧 Work in Progress 🚧): A powerful web scraping solution for AI tools and vector databases. Extract clean HTML, generate LLM-friendly content, and create embeddings from any URL.

27
/ 100
Experimental

This tool helps AI developers and data engineers quickly gather and process web content for their AI applications. It takes any URL as input and produces clean, ad-free text, Markdown, key information, and even embeddings, ready for use in large language models or vector databases. This is ideal for teams building AI tools that need to ingest and understand web-based information.

No commits in the last 6 months.

Use this if you need to systematically scrape web pages, extract specific information, and prepare that content in an AI-ready format for your language models or vector databases.

Not ideal if you're looking for a simple, general-purpose web scraper for personal use or if your main goal is traditional data analysis rather than AI application development.

AI development data engineering web content processing LLM data preparation vector database integration
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 16 / 25
Community 5 / 25

How are scores calculated?

Stars

19

Forks

1

Language

TypeScript

License

Apache-2.0

Last pushed

Aug 05, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/vector-db/DiscovAI/DiscovAI-crawl"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.