tracebloc/data-ingestors
tracebloc data pipeline for training/test dataset setup
This helps data scientists and ML engineers prepare raw data for machine learning model training and evaluation. It takes your raw image, text, tabular, or time-series data, validates and preprocesses it, then securely transfers a clean dataset into your Kubernetes cluster. Only metadata syncs to the tracebloc web app for visual management, ensuring your data remains on your infrastructure.
Use this if you need a secure, streamlined way to get your raw datasets ready for AI model training within the tracebloc platform, while keeping sensitive data on your own servers.
Not ideal if you are looking for a standalone data cleaning tool and do not intend to train models using the tracebloc platform and its Kubernetes integration.
Stars
8
Forks
—
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 15, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/data-engineering/tracebloc/data-ingestors"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
PrefectHQ/prefect
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
growthbook/growthbook
Open Source Feature Flags, Experimentation, and Product Analytics
koopjs/koop
Transform, query, and download geospatial data on the web.
pathwaycom/pathway
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
dagster-io/dagster
An orchestration platform for the development, production, and observation of data assets.