duriantaco/pykomodo

A Python-based parallel file chunking system designed for processing large codebases into LLM-friendly chunks.

29
/ 100
Experimental

This tool helps AI engineers and machine learning practitioners prepare large collections of source code and PDF documents for use with large language models (LLMs). It takes in various file types, processes them in parallel, and outputs smaller, contextually rich chunks optimized for LLMs, including a JSONL format compatible with tools like LangChain and LlamaIndex. This is designed for those building LLM-powered applications or knowledge retrieval systems from extensive codebases.

No commits in the last 6 months.

Use this if you need to break down large code repositories or document sets into manageable, high-quality chunks that provide maximum value to an LLM.

Not ideal if you only need to perform basic text splitting without any of the advanced filtering, metadata extraction, or LLM-specific optimizations.

LLM-fine-tuning RAG-system-prep code-analysis document-processing AI-engineering
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 3 / 25

How are scores calculated?

Stars

47

Forks

1

Language

Python

License

Apache-2.0

Last pushed

Aug 13, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/duriantaco/pykomodo"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.