Uncategorized Data Engineering Tools

There are 75 uncategorized tools tracked. 1 score above 70 (verified tier). The highest-rated is dagucloud/dagu at 70/100 with 3,244 stars. 3 of the top 10 are actively maintained.

Get all 75 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=data-engineering&subcategory=uncategorized&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 dagucloud/dagu

A local-first workflow engine built the way it should be: declarative,...

70
Verified
2 risesoft-y9/DataFlow-Engine

数据流引擎是一款面向数据集成、数据同步、数据交换、数据共享、任务配置、任务调度的底层数据驱动引擎。数据流引擎采用管执分离、多流层、插件库等体系应对大规模数...

64
Established
3 insitro/redun

Yet another redundant workflow engine

63
Established
4 hyparam/icebird

Icebird: JavaScript Iceberg Client

61
Established
5 cnstlungu/portable-data-stack-dagster

A portable Datamart and Business Intelligence suite built with Docker,...

60
Established
6 vibhorkum/pg_background

Production-grade PostgreSQL extension to execute arbitrary SQL in background...

59
Established
7 cparmet/pandas-checks

🐼🩺 Pandas Checks: Non-invasive health checks for Pandas method chains

59
Established
8 uptake/uptasticsearch

An Elasticsearch client tailored to data science workflows.

58
Established
9 snowplow/enrich

Snowplow Enrichment jobs and library

58
Established
10 snowplow/dbt-snowplow-web

A fully incremental model, that transforms raw web event data generated by...

56
Established
11 ICIJ/extract

A cross-platform command line tool for parallelised content extraction and analysis.

56
Established
12 mozilla/python_mozetl

ETL jobs for Firefox Telemetry

56
Established
13 nodestream-proj/nodestream

A Declarative framework for Building, Maintaining, and Analyzing Graph Data

55
Established
14 cnstlungu/portable-data-stack-mage

A portable Datamart and Business Intelligence suite built with Docker, Mage,...

54
Established
15 apache/doris-kafka-connector

Kafka Connector for Apache Doris

54
Established
16 cnstlungu/portable-data-stack-airflow

A portable Datamart and Business Intelligence suite built with Docker,...

54
Established
17 caiopizzol/cnpj-data-pipeline

Pipeline open-source que baixa e processa os dados da Receita Federal para PostgreSQL

52
Established
18 zazuko/barnard59

An intuitive and flexible RDF pipeline solution designed to simplify and...

52
Established
19 evdubs/oic-options-chains

ETL for OIC Options Chains

52
Established
20 PFund-Software-Ltd/pfeed

Data Engine for Manual/Algo Trading: Download/Stream -> Clean -> Store....

52
Established
21 edanalytics/earthmover

CLI tool for transforming collections of tabular source data into a variety...

51
Established
22 dtmirizzi/target-elasticsearch

A Meltano target for Elasticsearch

50
Established
23 bmeares/Meerschaum

Create and manage data pipes with Meerschaum.

49
Emerging
24 DataBora/elusion

DataFrame / Data Engineering Library with familiar syntax like ones we love:...

49
Emerging
25 snowplow/dbt-snowplow-normalize

A dbt package to support modelling event data via split tables for use in...

47
Emerging
26 joonan-lab/cwas

Category-wide association study (CWAS) (Werling et al., 2018; An et al., 2018)

47
Emerging
27 abdubakr77/deepcsv

Automatically processes data files in directories, converts array-like...

47
Emerging
28 sipist/sipist-workspace

This repository provides containerized applications and microservices for...

45
Emerging
29 bdist/bdist-workspace

This repository provides containerized applications and microservices for...

45
Emerging
30 bastienboutonnet/sheetwork

A handy package to load Google Sheets to your database right from the CLI...

45
Emerging
31 samber/awesome-olap

🧊 A curated list of OLAP databases, data lake tools, columnar engines, and...

45
Emerging
32 jitsucom/bulker

Service for bulk-loading data to databases with automatic schema management...

44
Emerging
33 Trojan3877/AWS-SageMaker-Snowflake-ML-Pipeline

The **AWS SageMaker + Snowflake ML Pipeline** is a fully production-grade,...

43
Emerging
34 cloverdx/cloverdx-server-docker

CloverDX Docker container for CloverDX Server deployment including examples.

43
Emerging
35 emeraldpay/dshackle-archive

ETL for Bitcoin and Ethereum data

43
Emerging
36 cnstlungu/postcard-company-datamart

learning-by-doing data model built with dbt-core

42
Emerging
37 zappzerapp/laravel-ingest

A robust, configuration-driven ETL and data import framework for Laravel....

42
Emerging
38 jrlasak/awesome-databricks

170+ curated resources every Databricks Data Engineer should bookmark -...

41
Emerging
39 fairtracks/omnipy

Omnipy is a high level Python library for type-driven data wrangling and...

40
Emerging
40 govtech-data-practice/vowl

A validation engine for Open Data Contract Standard (ODCS) data contracts....

40
Emerging
41 nshkrdotcom/flowstone

Asset-first data orchestration for Elixir/BEAM. Dagster-inspired with OTP...

40
Emerging
42 AbdullahEmad22/realtime-data-engineering-project

An end-to-end data engineering pipeline that orchestrates data ingestion,...

39
Emerging
43 AlvaroCavalcante/airflow-parse-bench

Stop creating bad DAGs! Use this tool to measure and compare the parse time...

39
Emerging
44 root-11/tablite

multiprocessing enabled out-of-memory data analysis library for tabular data.

38
Emerging
45 chnm/bom

Website files, database GUI, and data pipeline scripts for the London Bills...

38
Emerging
46 atolcd/sdis-remocra

🔥 Remocra - Plateforme métier opensource conçue par et pour les SDIS.

37
Emerging
47 astronomer/cosmos-ebook-companion

Companion repository to the Practical Guide: Orchestrating dbt with Apache...

36
Emerging
48 provero-org/provero

Declarative data quality engine. Define checks in YAML, run anywhere.

36
Emerging
49 The-Pulse-Engine/Pulse-Engine_Market_Intelligence_Platform

An explainable market analysis system that combines technical indicators and...

34
Emerging
50 PkLavc/PkLavc.github.io

PkLavc Portfolio | Solutions & Integration Architect (Technical Owner)....

34
Emerging
51 limhaneul12/kafka-gov

Open-Source Apache Kafka Governance Platform

34
Emerging
52 Codex-Crusader/le_Market_Intelligence_Platform

An explainable market analysis system that combines technical indicators and...

34
Emerging
53 justvinhhere/bigquery-expert

Claude Code plugin that makes Claude a BigQuery expert. 5 skills covering...

34
Emerging
54 guotong1988/Automatic-Label-Error-Correction

Automatic Label Error Correction www.techrxiv.org/users/679328/articles/731085

33
Emerging
55 pr1m8/haive-dataflow

Data processing pipelines and ETL workflows for Haive agents

33
Emerging
56 caiopizzol/fipe-data-pipeline

Coleta e processa dados históricos de preços da Tabela FIPE para PostgreSQL.

33
Emerging
57 MTSWebServices/spark-dialect-extension

Extend JDBC types support for Apache Spark.

33
Emerging
58 Ryanditko/Roadmap-Projects

A comprehensive collection of 180 curated project ideas across 6 technology...

32
Emerging
59 worldbank/OvertureLink-Data-Pipeline

This ETL pipeline allows you to query and extract Overture Maps data (such...

32
Emerging
60 masthead-data/terraform-google-masthead-agent

Google Cloud resources for Masthead Data agent integration.

32
Emerging
61 Joerndm/stock_portefolio_builder

Using Machine Learning to predict future stock prices and creating a stock...

32
Emerging
62 docglow/docglow

Modern documentation site generator for dbt Core — lineage explorer, health...

31
Emerging
63 cnstlungu/portable-data-stack-bruin

A portable Datamart and Business Intelligence suite built with Docker,...

31
Emerging
64 granthjoshi01/AQI-Analysis-Project

End-to-end AQI data pipeline with automated collection, historical storage,...

31
Emerging
65 Trojan3877/diabetes-prediction-ml-pipeline

The Diabetes Prediction ML Pipeline is a production-ready end-to-end...

31
Emerging
66 AlvaroCavalcante/airflow-calendar-plugin

A Google Calendar-style plugin to improve your DAG management with a visual schedule

28
Experimental
67 nitish9413/open_auto_loader

OpenAutoLoader: A lightweight, open-source alternative to Databricks Auto...

27
Experimental
68 feitasIoT/CRose

CRose(China...

27
Experimental
69 TheoV823/cannabis-price-index

Open-source methodology, SQL, and sample data for a Cannabis Price Index....

25
Experimental
70 erangi/podcasts

The list of podcasts I listen to

24
Experimental
71 osodevops/k2i

K2I - Kafka to Iceberg streaming ingestion engine. A Rust CLI tool inspired...

23
Experimental
72 drogba0027/dev-resources-hub

Dev Resources Hub is a curated collection of free frontend, backend, UI/UX...

21
Experimental
73 josephmachado/airflow-tutorial

Code for Airflow 3.0 Tutorial

19
Experimental
74 Biswajit107927/data-platform-quicksight

End-to-end AWS data platform — Kinesis → Glue → Iceberg → Redshift →...

17
Experimental
75 takers2018/medical-indication-market-sizing-scraper

Demonstrates a practical data product: headless JS fetchers capture dynamic...

13
Experimental