duriantaco/pykomodo
A Python-based parallel file chunking system designed for processing large codebases into LLM-friendly chunks.
This tool helps AI engineers and machine learning practitioners prepare large collections of source code and PDF documents for use with large language models (LLMs). It takes in various file types, processes them in parallel, and outputs smaller, contextually rich chunks optimized for LLMs, including a JSONL format compatible with tools like LangChain and LlamaIndex. This is designed for those building LLM-powered applications or knowledge retrieval systems from extensive codebases.
No commits in the last 6 months.
Use this if you need to break down large code repositories or document sets into manageable, high-quality chunks that provide maximum value to an LLM.
Not ideal if you only need to perform basic text splitting without any of the advanced filtering, metadata extraction, or LLM-specific optimizations.
Stars
47
Forks
1
Language
Python
License
Apache-2.0
Category
Last pushed
Aug 13, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/duriantaco/pykomodo"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
icereed/paperless-gpt
Use LLMs and LLM Vision (OCR) to handle paperless-ngx - Document Digitalization powered by AI
CTU-LinguTechies/VN-Law-Advisor
Ứng dụng hỗ trợ tra cứu, hỏi đáp tri thức pháp luật dựa trên Bộ pháp điển và CSDL văn bản QPPL Việt Nam.
SharanyaAchanta/LexTransition-AI
LexTransition AI is an open-source, offline-first legal assistant. It helps users navigate the...
TusharSaini999/HackIndia-Spark-9-2025-Code-Hackers
VakeelAI is an AI-powered legal assistant that answers questions related to Indian laws. It...
lvwzhen/law-cn-ai
⚖️ AI 法律助手