jacobwarren/social-media-ai-engineering-etl

Real-world AI engineering dataset creation, SFT fine-tuning, and GRPO alignment ETL pipeline.

30
/ 100
Emerging

This project helps AI engineers and machine learning practitioners transform raw social media posts into highly structured datasets, ready for fine-tuning large language models. You feed in social media data, and it outputs meticulously organized training splits (SFT and DPO) for building specialized AI models. It's designed for individuals creating custom LLMs, especially those focused on specific writing styles or social media content.

No commits in the last 6 months.

Use this if you need to quickly and reliably create high-quality, task-specific datasets from social media content for training or fine-tuning large language models on an NVIDIA GPU.

Not ideal if you don't have access to a data-center NVIDIA GPU or if your primary goal is general-purpose LLM training without specific social media data or style constraints.

AI-engineering LLM-fine-tuning social-media-data-preparation machine-learning-engineering natural-language-processing
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 7 / 25
Maturity 15 / 25
Community 6 / 25

How are scores calculated?

Stars

33

Forks

2

Language

Python

License

Apache-2.0

Last pushed

Aug 27, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/jacobwarren/social-media-ai-engineering-etl"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.