pedrokohler/github-repo-to-single-file
TypeScript CLI that pulls a GitHub repo and merges all text-like files into one clean .txt or .pdf. Skips binaries, streams progress, writes to /out. Ideal for LLMs and RAG: feed an entire codebase as a single artifact so models can use the repository context seamlessly.
This tool helps developers consolidate all the text-based files from a GitHub repository or local codebase into a single, clean text or PDF document. It takes a repository URL or local directory path as input and produces a merged file in the 'out/' directory. Software engineers or AI/ML practitioners preparing codebases for analysis or large language models would find this useful.
Use this if you need to create a unified view of an entire codebase for documentation, review, or as input for AI models like those used in RAG (Retrieval Augmented Generation).
Not ideal if you need to preserve the original file structure, only care about a few specific files, or are working with non-text files like images or compiled binaries.
Stars
12
Forks
1
Language
TypeScript
License
—
Category
Last pushed
Dec 12, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/pedrokohler/github-repo-to-single-file"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
any4ai/AnyCrawl
AnyCrawl π: A Node.js/TypeScript crawler that turns websites into LLM-ready data and extracts...
kreuzberg-dev/html-to-markdown
High performance and CommonMark compliant HTML to Markdown converter. Maintained by the...
ScrapeGraphAI/Scrapegraph-ai
Python scraper based on AI
adbar/trafilatura
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping,...
paulpierre/markdown-crawler
A multithreaded πΈοΈ web crawler that recursively crawls a website and creates a π½ markdown file...