duriantaco/pykomodo

A Python-based parallel file chunking system designed for processing large codebases into LLM-friendly chunks.

/ 100

Experimental

This tool helps AI engineers and machine learning practitioners prepare large collections of source code and PDF documents for use with large language models (LLMs). It takes in various file types, processes them in parallel, and outputs smaller, contextually rich chunks optimized for LLMs, including a JSONL format compatible with tools like LangChain and LlamaIndex. This is designed for those building LLM-powered applications or knowledge retrieval systems from extensive codebases.

No commits in the last 6 months.

Use this if you need to break down large code repositories or document sets into manageable, high-quality chunks that provide maximum value to an LLM.

Not ideal if you only need to perform basic text splitting without any of the advanced filtering, metadata extraction, or LLM-specific optimizations.

LLM-fine-tuning RAG-system-prep code-analysis document-processing AI-engineering

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 3 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

icereed/paperless-gpt

Use LLMs and LLM Vision (OCR) to handle paperless-ngx - Document Digitalization powered by AI

CTU-LinguTechies/VN-Law-Advisor

Ứng dụng hỗ trợ tra cứu, hỏi đáp tri thức pháp luật dựa trên Bộ pháp điển và CSDL văn bản QPPL Việt Nam.

SharanyaAchanta/LexTransition-AI

LexTransition AI is an open-source, offline-first legal assistant. It helps users navigate the...

TusharSaini999/HackIndia-Spark-9-2025-Code-Hackers

VakeelAI is an AI-powered legal assistant that answers questions related to Indian laws. It...

lvwzhen/law-cn-ai

⚖️ AI 法律助手

Explore LLM Tools

All categories Trending LLM Tool directory Insights