Uncategorized Data Engineering Tools
There are 75 uncategorized tools tracked. 1 score above 70 (verified tier). The highest-rated is dagucloud/dagu at 70/100 with 3,244 stars. 3 of the top 10 are actively maintained.
Get all 75 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=data-engineering&subcategory=uncategorized&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
dagucloud/dagu
A local-first workflow engine built the way it should be: declarative,... |
|
Verified |
| 2 |
risesoft-y9/DataFlow-Engine
数据流引擎是一款面向数据集成、数据同步、数据交换、数据共享、任务配置、任务调度的底层数据驱动引擎。数据流引擎采用管执分离、多流层、插件库等体系应对大规模数... |
|
Established |
| 3 |
insitro/redun
Yet another redundant workflow engine |
|
Established |
| 4 |
hyparam/icebird
Icebird: JavaScript Iceberg Client |
|
Established |
| 5 |
cnstlungu/portable-data-stack-dagster
A portable Datamart and Business Intelligence suite built with Docker,... |
|
Established |
| 6 |
vibhorkum/pg_background
Production-grade PostgreSQL extension to execute arbitrary SQL in background... |
|
Established |
| 7 |
cparmet/pandas-checks
🐼🩺 Pandas Checks: Non-invasive health checks for Pandas method chains |
|
Established |
| 8 |
uptake/uptasticsearch
An Elasticsearch client tailored to data science workflows. |
|
Established |
| 9 |
snowplow/enrich
Snowplow Enrichment jobs and library |
|
Established |
| 10 |
snowplow/dbt-snowplow-web
A fully incremental model, that transforms raw web event data generated by... |
|
Established |
| 11 |
ICIJ/extract
A cross-platform command line tool for parallelised content extraction and analysis. |
|
Established |
| 12 |
mozilla/python_mozetl
ETL jobs for Firefox Telemetry |
|
Established |
| 13 |
nodestream-proj/nodestream
A Declarative framework for Building, Maintaining, and Analyzing Graph Data |
|
Established |
| 14 |
cnstlungu/portable-data-stack-mage
A portable Datamart and Business Intelligence suite built with Docker, Mage,... |
|
Established |
| 15 |
apache/doris-kafka-connector
Kafka Connector for Apache Doris |
|
Established |
| 16 |
cnstlungu/portable-data-stack-airflow
A portable Datamart and Business Intelligence suite built with Docker,... |
|
Established |
| 17 |
caiopizzol/cnpj-data-pipeline
Pipeline open-source que baixa e processa os dados da Receita Federal para PostgreSQL |
|
Established |
| 18 |
zazuko/barnard59
An intuitive and flexible RDF pipeline solution designed to simplify and... |
|
Established |
| 19 |
evdubs/oic-options-chains
ETL for OIC Options Chains |
|
Established |
| 20 |
PFund-Software-Ltd/pfeed
Data Engine for Manual/Algo Trading: Download/Stream -> Clean -> Store.... |
|
Established |
| 21 |
edanalytics/earthmover
CLI tool for transforming collections of tabular source data into a variety... |
|
Established |
| 22 |
dtmirizzi/target-elasticsearch
A Meltano target for Elasticsearch |
|
Established |
| 23 |
bmeares/Meerschaum
Create and manage data pipes with Meerschaum. |
|
Emerging |
| 24 |
DataBora/elusion
DataFrame / Data Engineering Library with familiar syntax like ones we love:... |
|
Emerging |
| 25 |
snowplow/dbt-snowplow-normalize
A dbt package to support modelling event data via split tables for use in... |
|
Emerging |
| 26 |
joonan-lab/cwas
Category-wide association study (CWAS) (Werling et al., 2018; An et al., 2018) |
|
Emerging |
| 27 |
abdubakr77/deepcsv
Automatically processes data files in directories, converts array-like... |
|
Emerging |
| 28 |
sipist/sipist-workspace
This repository provides containerized applications and microservices for... |
|
Emerging |
| 29 |
bdist/bdist-workspace
This repository provides containerized applications and microservices for... |
|
Emerging |
| 30 |
bastienboutonnet/sheetwork
A handy package to load Google Sheets to your database right from the CLI... |
|
Emerging |
| 31 |
samber/awesome-olap
🧊 A curated list of OLAP databases, data lake tools, columnar engines, and... |
|
Emerging |
| 32 |
jitsucom/bulker
Service for bulk-loading data to databases with automatic schema management... |
|
Emerging |
| 33 |
Trojan3877/AWS-SageMaker-Snowflake-ML-Pipeline
The **AWS SageMaker + Snowflake ML Pipeline** is a fully production-grade,... |
|
Emerging |
| 34 |
cloverdx/cloverdx-server-docker
CloverDX Docker container for CloverDX Server deployment including examples. |
|
Emerging |
| 35 |
emeraldpay/dshackle-archive
ETL for Bitcoin and Ethereum data |
|
Emerging |
| 36 |
cnstlungu/postcard-company-datamart
learning-by-doing data model built with dbt-core |
|
Emerging |
| 37 |
zappzerapp/laravel-ingest
A robust, configuration-driven ETL and data import framework for Laravel.... |
|
Emerging |
| 38 |
jrlasak/awesome-databricks
170+ curated resources every Databricks Data Engineer should bookmark -... |
|
Emerging |
| 39 |
fairtracks/omnipy
Omnipy is a high level Python library for type-driven data wrangling and... |
|
Emerging |
| 40 |
govtech-data-practice/vowl
A validation engine for Open Data Contract Standard (ODCS) data contracts.... |
|
Emerging |
| 41 |
nshkrdotcom/flowstone
Asset-first data orchestration for Elixir/BEAM. Dagster-inspired with OTP... |
|
Emerging |
| 42 |
AbdullahEmad22/realtime-data-engineering-project
An end-to-end data engineering pipeline that orchestrates data ingestion,... |
|
Emerging |
| 43 |
AlvaroCavalcante/airflow-parse-bench
Stop creating bad DAGs! Use this tool to measure and compare the parse time... |
|
Emerging |
| 44 |
root-11/tablite
multiprocessing enabled out-of-memory data analysis library for tabular data. |
|
Emerging |
| 45 |
chnm/bom
Website files, database GUI, and data pipeline scripts for the London Bills... |
|
Emerging |
| 46 |
atolcd/sdis-remocra
🔥 Remocra - Plateforme métier opensource conçue par et pour les SDIS. |
|
Emerging |
| 47 |
astronomer/cosmos-ebook-companion
Companion repository to the Practical Guide: Orchestrating dbt with Apache... |
|
Emerging |
| 48 |
provero-org/provero
Declarative data quality engine. Define checks in YAML, run anywhere. |
|
Emerging |
| 49 |
The-Pulse-Engine/Pulse-Engine_Market_Intelligence_Platform
An explainable market analysis system that combines technical indicators and... |
|
Emerging |
| 50 |
PkLavc/PkLavc.github.io
PkLavc Portfolio | Solutions & Integration Architect (Technical Owner).... |
|
Emerging |
| 51 |
limhaneul12/kafka-gov
Open-Source Apache Kafka Governance Platform |
|
Emerging |
| 52 |
Codex-Crusader/le_Market_Intelligence_Platform
An explainable market analysis system that combines technical indicators and... |
|
Emerging |
| 53 |
justvinhhere/bigquery-expert
Claude Code plugin that makes Claude a BigQuery expert. 5 skills covering... |
|
Emerging |
| 54 |
guotong1988/Automatic-Label-Error-Correction
Automatic Label Error Correction www.techrxiv.org/users/679328/articles/731085 |
|
Emerging |
| 55 |
pr1m8/haive-dataflow
Data processing pipelines and ETL workflows for Haive agents |
|
Emerging |
| 56 |
caiopizzol/fipe-data-pipeline
Coleta e processa dados históricos de preços da Tabela FIPE para PostgreSQL. |
|
Emerging |
| 57 |
MTSWebServices/spark-dialect-extension
Extend JDBC types support for Apache Spark. |
|
Emerging |
| 58 |
Ryanditko/Roadmap-Projects
A comprehensive collection of 180 curated project ideas across 6 technology... |
|
Emerging |
| 59 |
worldbank/OvertureLink-Data-Pipeline
This ETL pipeline allows you to query and extract Overture Maps data (such... |
|
Emerging |
| 60 |
masthead-data/terraform-google-masthead-agent
Google Cloud resources for Masthead Data agent integration. |
|
Emerging |
| 61 |
Joerndm/stock_portefolio_builder
Using Machine Learning to predict future stock prices and creating a stock... |
|
Emerging |
| 62 |
docglow/docglow
Modern documentation site generator for dbt Core — lineage explorer, health... |
|
Emerging |
| 63 |
cnstlungu/portable-data-stack-bruin
A portable Datamart and Business Intelligence suite built with Docker,... |
|
Emerging |
| 64 |
granthjoshi01/AQI-Analysis-Project
End-to-end AQI data pipeline with automated collection, historical storage,... |
|
Emerging |
| 65 |
Trojan3877/diabetes-prediction-ml-pipeline
The Diabetes Prediction ML Pipeline is a production-ready end-to-end... |
|
Emerging |
| 66 |
AlvaroCavalcante/airflow-calendar-plugin
A Google Calendar-style plugin to improve your DAG management with a visual schedule |
|
Experimental |
| 67 |
nitish9413/open_auto_loader
OpenAutoLoader: A lightweight, open-source alternative to Databricks Auto... |
|
Experimental |
| 68 |
feitasIoT/CRose
CRose(China... |
|
Experimental |
| 69 |
TheoV823/cannabis-price-index
Open-source methodology, SQL, and sample data for a Cannabis Price Index.... |
|
Experimental |
| 70 |
erangi/podcasts
The list of podcasts I listen to |
|
Experimental |
| 71 |
osodevops/k2i
K2I - Kafka to Iceberg streaming ingestion engine. A Rust CLI tool inspired... |
|
Experimental |
| 72 |
drogba0027/dev-resources-hub
Dev Resources Hub is a curated collection of free frontend, backend, UI/UX... |
|
Experimental |
| 73 |
josephmachado/airflow-tutorial
Code for Airflow 3.0 Tutorial |
|
Experimental |
| 74 |
Biswajit107927/data-platform-quicksight
End-to-end AWS data platform — Kinesis → Glue → Iceberg → Redshift →... |
|
Experimental |
| 75 |
takers2018/medical-indication-market-sizing-scraper
Demonstrates a practical data product: headless JS fetchers capture dynamic... |
|
Experimental |