sayakpaul/hf-codegen
A repository of Python scripts to scrape code contents of the public repositories of `huggingface`.
This project helps machine learning engineers or researchers gather a large collection of code from Hugging Face's public GitHub repositories. It takes publicly available codebases and processes them into a structured dataset. This dataset is designed to be used for training custom AI coding assistants or for code analysis tasks.
No commits in the last 6 months.
Use this if you are an AI developer or researcher who needs a curated dataset of real-world Python code from a specific, high-quality source (Hugging Face) to train your own code-generating or code-understanding models.
Not ideal if you're looking for a simple plug-and-play coding assistant or if you need to analyze code from private repositories or other platforms.
Stars
54
Forks
21
Language
Python
License
—
Category
Last pushed
Feb 27, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ai-coding/sayakpaul/hf-codegen"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
howerj/dbcc
CAN DBC to C (and CSV, JSON and XML) compiler using the mpc parser combinator library
JhnW/devana
Python package to parse and generate C/C++ code as context aware preprocessor.
biojppm/regen
Easy C++ reflection and code generation
SoftSec-KAIST/CodeAlchemist
CodeAlchemist: Semantics-Aware Code Generation to Find Vulnerabilities in JavaScript Engines (NDSS '19)
Samsung/UTopia
UT based automated fuzz driver generation