Pro-GenAI/AutoPureData
Automated Filtering of Undesirable Web Data to Update LLM Knowledge
This project helps anyone maintaining a large language model (LLM) keep its knowledge up-to-date with current web information. It takes raw, unfiltered web data and automatically cleans it, removing unsafe content, unreliable sources, personal details, and adversarial attacks. The output is a refined dataset ready for updating your LLM, ensuring it provides accurate and safe responses.
No commits in the last 6 months.
Use this if you need to feed your LLM the latest web information but are concerned about the quality and safety of raw internet data.
Not ideal if you need a solution for a production environment, as this project is currently intended for educational and research purposes only.
Stars
8
Forks
2
Language
Jupyter Notebook
License
—
Category
Last pushed
Sep 18, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/Pro-GenAI/AutoPureData"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
WangRongsheng/awesome-LLM-resources
🧑🚀 全世界最好的LLM资料总结(多模态生成、Agent、辅助编程、AI审稿、数据处理、模型训练、模型推理、o1 模型、MCP、小语言模型、视觉语言模型) | Summary of the...
SylphAI-Inc/AdalFlow
AdalFlow: The library to build & auto-optimize LLM applications.
LazyAGI/LazyLLM
Easiest and laziest way for building multi-agent LLMs applications.
luhengshiwo/LLMForEverybody
每个人都能看懂的大模型知识分享,LLMs春/秋招大模型面试前必看,让你和面试官侃侃而谈
katanaml/sparrow
Structured data extraction and instruction calling with ML, LLM and Vision LLM