Data Pipeline Frameworks Data Engineering Tools

Tools for building, deploying, and orchestrating end-to-end data workflows (ETL/ELT, transformations, ingestion). Does NOT include SQL learning resources, individual data connectors, or general-purpose query engines.

There are 261 data pipeline frameworks tools tracked. 36 score above 70 (verified tier). The highest-rated is PrefectHQ/prefect at 95/100 with 21,898 stars and 9,593,004 monthly downloads. 9 of the top 10 are actively maintained.

Get all 261 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=data-engineering&subcategory=data-pipeline-frameworks&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 PrefectHQ/prefect

Prefect is a workflow orchestration framework for building resilient data...

95
Verified
2 growthbook/growthbook

Open Source Feature Flags, Experimentation, and Product Analytics

90
Verified
3 koopjs/koop

Transform, query, and download geospatial data on the web.

89
Verified
4 pathwaycom/pathway

Python ETL framework for stream processing, real-time analytics, LLM...

85
Verified
5 dagster-io/dagster

An orchestration platform for the development, production, and observation...

84
Verified
6 dlt-hub/dlt

data load tool (dlt) is an open source Python library that makes data...

80
Verified
7 meltano/meltano

Meltano: the declarative code-first data integration engine that powers your...

79
Verified
8 capitalone/locopy

locopy: Loading/Unloading to Redshift and Snowflake using Python.

79
Verified
9 apache/hop

Hop Orchestration Platform

76
Verified
10 apache/superset

Apache Superset is a Data Visualization and Data Exploration Platform

76
Verified
11 airbytehq/airbyte

The leading data integration platform for ETL / ELT data pipelines from...

76
Verified
12 pyjanitor-devs/pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor

76
Verified
13 apache/shardingsphere

Empowering Data Intelligence with Distributed SQL for Sharding, Scalability,...

76
Verified
14 catalyst-cooperative/pudl

The Public Utility Data Liberation Project provides analysis-ready energy...

76
Verified
15 debezium/debezium

Change data capture for a variety of databases. Please log issues at...

76
Verified
16 quiltdata/quilt

Quilt is a Scientific Data Management Platform on AWS that helps teams and...

75
Verified
17 bruin-data/ingestr

ingestr is a CLI tool to copy data between any databases with a single...

74
Verified
18 apache/incubator-devlake

Apache DevLake is an open-source dev data platform to ingest, analyze, and...

74
Verified
19 databricks/dbt-databricks

A dbt adapter for Databricks.

74
Verified
20 odpi/egeria

Egeria core

74
Verified
21 apache/flink-cdc

Flink CDC is a streaming data integration tool

74
Verified
22 dathere/qsv

Blazing-fast Data-Wrangling toolkit

73
Verified
23 datazip-inc/olake

OLake - Fastest Databases, Kafka & S3 Replication to Apache Iceberg or Plain...

72
Verified
24 nordquant/complete-dbt-bootcamp-zero-to-hero

Supplementary Materials for the The Complete dbt (Data Build Tool) Bootcamp...

71
Verified
25 nightscape/spark-excel

A Spark plugin for reading and writing Excel files

71
Verified
26 datavane/tis

Support agile DataOps Based on Flink, DataX and Flink-CDC, Chunjun with Web-UI

71
Verified
27 vectordotdev/vector

A high-performance observability data pipeline.

71
Verified
28 ariacom/Seal-Report

Database Reporting Tool and Tasks (.Net)

71
Verified
29 dataform-co/dataform

Dataform is a framework for managing SQL based data operations in BigQuery

71
Verified
30 datavane/datavines

Know your data better!Datavines is Next-gen Data Observability Platform,...

71
Verified
31 wgzhao/Addax

A fast and versatile ETL tool that can transfer data between RDBMS and NoSQL...

71
Verified
32 sodadata/soda-core

Data Contracts engine for the modern data stack. https://www.soda.io

70
Verified
33 cloudquery/cloudquery

Data pipelines for cloud config and security data. Build cloud asset...

70
Verified
34 risingwavelabs/risingwave

Event streaming platform for agents, apps, and analytics. Continuously...

70
Verified
35 dagu-org/dagu

A local-first workflow engine built the way it should be: declarative,...

70
Verified
36 aws/aws-sdk-pandas

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream,...

70
Verified
37 datajoint/datajoint-python

Relational data pipelines for the science lab

69
Established
38 snowplow/snowplow

The leader in Customer Data Infrastructure

69
Established
39 PeerDB-io/peerdb

Fast, Simple and a cost effective tool to replicate data from Postgres to...

69
Established
40 treeverse/lakeFS

lakeFS - Data version control for your data lake | Git for data

68
Established
41 apecloud/ape-dts

ApeCloud's Data Transfer Suite, written in Rust. Provides ultra-fast data...

68
Established
42 iBridges-for-iRODS/iBridges

A wrapper around the python-irodsclient to allow for easy interaction with...

67
Established
43 SQLMesh/sqlmesh

Scalable and efficient data transformation framework - backwards compatible with dbt.

67
Established
44 amphi-ai/amphi-etl

visual data prep powered by python

67
Established
45 scribe-org/Scribe-Data

Wikidata and Wiktionary language data extraction

67
Established
46 vietvudanh/vietlott-data

Automation fetching data for Vietlott. Just for fun.

67
Established
47 datacleaner/DataCleaner

The premier open source Data Quality solution

67
Established
48 elementary-data/elementary

The dbt-native data observability solution for data & analytics engineers....

66
Established
49 networktocode/diffsync

A utility library for comparing and synchronizing different datasets.

66
Established
50 dotflow-io/dotflow

🎲 Business Logic Code in a flow!

65
Established
51 biglocalnews/warn-transformer

Consolidate, enrich and republish the data gathered by warn-scraper

65
Established
52 slingdata-io/sling-cli

Sling is a CLI tool that extracts data from a source storage/database and...

65
Established
53 timeplus-io/proton

⚡ Fastest SQL ETL pipeline in a single C++ binary, built for stream...

65
Established
54 snowflakedb/snowpark-python

Snowflake Snowpark Python API

64
Established
55 biglocalnews/warn-scraper

Command-line interface for downloading WARN Act notices of qualified plant...

64
Established
56 fedspendingtransparency/usaspending-api

Server application to serve U.S. federal spending data via a RESTful API

64
Established
57 Desbordante/desbordante-core

Desbordante is a high-performance data profiler that is capable of...

63
Established
58 laminlabs/lamindb

Open-source data framework for biology. Context and memory for datasets and...

63
Established
59 bitpicky/dbt-sugar

dbt-sugar is a CLI tool that allows users of dbt to have fun and ease...

63
Established
60 datagouv/csv-detective

Inspection of tabular (csv, xls-like) files to guess the columns' content

63
Established
61 osalvador/ReplicaDB

ReplicaDB is open source tool for database replication, designed for...

63
Established
62 ConduitIO/conduit

Conduit streams data between data stores. Kafka Connect replacement. No JVM required.

63
Established
63 redpanda-data/connect

Fancy stream processing made operationally mundane

63
Established
64 ohs-foundation/fhir-data-pipes

A collection of tools for extracting FHIR resources and analytics services...

63
Established
65 data-engineering-community/data-engineering-wiki

The best place to learn data engineering. Built and maintained by the data...

63
Established
66 TianLangStudio/DataXServer

为DataX(https://github.com/alibaba/DataX) 提供远程多语言调用(ThriftServer,HttpServer)...

62
Established
67 Multiwoven/multiwoven

🔥🔥🔥 Open source Reverse ETL - alternative to hightouch and census.

62
Established
68 fkie-cad/Logprep

log data pre processing, generation and shipping in python

62
Established
69 edkreuk/FMD_FRAMEWORK

The Fabric Metadata-Driven Framework (FMD) is a cutting-edge accelerator...

62
Established
70 opendatadiscovery/odd-platform

First open-source data discovery and observability platform. We make a life...

62
Established
71 astronomer/airflow-provider-fivetran-async

A new Airflow Provider for Fivetran, maintained by Astronomer and Fivetran

62
Established
72 cre-dev/xml2db

A Python package to load complex XML files into a relational database

62
Established
73 stn1slv/awesome-integration

A curated list of awesome system integration software and resources.

62
Established
74 airbytehq/PyAirbyte

PyAirbyte brings the power of Airbyte to every Python developer.

62
Established
75 neo4j/neo4j-jdbc

Official Neo4j JDBC Driver

61
Established
76 dlt-hub/verified-sources

Contribute to dlt verified sources 🔥

61
Established
77 Breeze0806/go-etl

go-etl is a toolset for data extraction, transformation and loading.

61
Established
78 DataKitchen/dataops-testgen

DataOps Data Quality TestGen is part of DataKitchen's Open Source Data...

61
Established
79 HTTP-RPC/Kilo

Lightweight REST for Java

60
Established
80 bitol-io/open-data-contract-standard

Home of the Open Data Contract Standard (ODCS).

60
Established
81 linkedpipes/etl

LinkedPipes ETL is an RDF based, lightweight ETL tool

59
Established
82 vmware/versatile-data-kit

One framework to develop, deploy and operate data workflows with Python and SQL.

59
Established
83 bacalhau-project/bacalhau

Community-driven, simple, yet powerful framework for fast, cost-effective...

59
Established
84 metafacture/metafacture-core

Core package of the Metafacture tool suite for metadata processing.

59
Established
85 AbsaOSS/cobrix

A COBOL parser and Mainframe/EBCDIC data source for Apache Spark

59
Established
86 dalenewman/Transformalize

Configurable Extract, Transform, and Load

59
Established
87 turbot/steampipe

Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No...

59
Established
88 DataTalksClub/data-engineering-zoomcamp

Data Engineering Zoomcamp is a free 9-week course on building...

59
Established
89 rudderlabs/rudder-server

Privacy and Security focused Segment-alternative, in Golang and React

58
Established
90 dashmug/glue-utils

glue-utils makes AWS Glue jobs less repetitive, more type-safe, and easier...

58
Established
91 dagster-io/community-integrations

Community supported integrations for the Dagster platform.

58
Established
92 dfpc-coe/CloudTAK

TAK Compatible, browser based Common Operation Picture & Situational Awareness tool

57
Established
93 datazip-inc/olake-ui

Frontend & BFF (Backend for frontend) for Olake. This includes the UI code...

57
Established
94 benjamin-awd/monopoly

Monopoly is a Python library & CLI that converts bank statement PDFs to CSV.

57
Established
95 dataflint/spark

Drop-in replacement for Apache Spark UI

57
Established
96 starlake-ai/starlake

Declarative text based tool for data analysts and engineers to extract,...

56
Established
97 flowsynx/flowsynx

A deterministic orchestrator for composable micro-workflows with reusable modules

56
Established
98 reductstore/reductstore

High Performance Storage and Streaming Solution for Data Acquisition Systems

56
Established
99 DataSQRL/sqrl

Data Pipeline Automation Framework to build MCP servers, data APIs, and data...

55
Established
100 odpi/egeria-docs

Documentation repository for the Egeria project.

55
Established
101 kay-ou/SimTradeData

SimTradeData is a utility library supporting SimTradeDesk, SimTradeLab and...

55
Established
102 Guepard-Corp/qwery-core

The Boring query platform - Connect and query anything

55
Established
103 OHDSI/ETL-Synthea

A package supporting the conversion from Synthea CSV to OMOP CDM

54
Established
104 microsoft/unified-data-foundation-with-fabric-solution-accelerator

Unified Data Foundation with Microsoft Fabric with Options to Integrate with...

54
Established
105 dflib/dflib

In-memory Java DataFrame library

54
Established
106 akmalsoliev/Validoopsie

A simple and easy to use Data Validation library for Python.

54
Established
107 tower/tower-cli

Next generation compute platform for the post-modern data stack

54
Established
108 kanton-bern/hellodata-be

The Open-Source Enterprise Data Platform in a single Portal

54
Established
109 GovHub-br/gov-hub

GovHub - Transformando Dados em Valor para Gestão Pública

53
Established
110 rpsft/etlbox

A lightweight ETL (extract, transform, load) library and data integration...

53
Established
111 mprove-io/mprove

Open Source Business Intelligence with Malloy Semantic Layer :tada:

53
Established
112 dbt-labs/jaffle-shop

🥪🦘 An open source sandbox project exploring dbt workflows via a fictional...

53
Established
113 opensnowcat/opensnowcat-enrich

OpenSnowcat Enricher (Apache 2.0 License)

53
Established
114 GitBrincie212/ChronoGrapher

Powerful, developer-experience centric, blazingly fast and extensible job...

53
Established
115 treeverse/charts

Helm charts

52
Established
116 fdmorison/tiozin

Tiozin, your friendly ETL framework

52
Established
117 Edwardvaneechoud/Flowfile

Flowfile is a visual ETL tool and Python library combining drag-and-drop...

52
Established
118 Bruno-Furtado/cloud-cnpj

Ingestão, preparação e disponibilização gratuita de dados de CNPJs de...

52
Established
119 AndreaBozzo/dataprof

Library and CLI for profiling tabular data

52
Established
120 halestudio/hale

(Spatial) data harmonisation with hale»studio (formerly HUMBOLDT Alignment Editor)

52
Established
121 DataRecce/recce

The data-validation toolkit for enhanced dbt (data build tool) PR review

52
Established
122 ara3d/bim-open-schema

Representing BIM Data as Parquet

52
Established
123 DataKitchen/data-observability-installer

Installer for DataKitchen's Open Source Data Observability Products. Data...

52
Established
124 bitol-io/open-data-product-standard

Home of the Open Data Product Standard (ODPS).

51
Established
125 xxh/xxh-shell-xonsh

Use @xonsh wherever you go through the SSH without installation on the host.

51
Established
126 wgzhao/addax-admin

Addax Admin is a web-based management console for Addax ETL jobs, offering...

51
Established
127 ogbinar/DataEngineeringPilipinas

Data Engineering Pilipinas is a community for data engineers, data analysts,...

50
Established
128 ashish10alex/vscode-dataform-tools

Dataform Tools - VS Code extension to run and visualise Dataform data...

50
Established
129 robert-koch-institut/mex-common

RKI Metadata Exchange | Software development toolkit for the MEx project...

50
Established
130 mehd-io/pypi-duck-flow

end-to-end data engineering project to get insights from PyPi using python,...

50
Established
131 hbz/lobid-resources

Transformation, web frontend, and API for the hbz catalog as LOD

50
Established
132 GoPlasmatic/dataflow-rs

A high-performance rules engine for IFTTT-style automation in Rust with...

50
Established
133 GovHub-br/data-application-gov-hub

Pipeline de Dados do Gov-Hub

50
Established
134 bbossgroups/bboss-elastic-tran

bboss-datatran 由 bboss 开源的数据采集&流批一体化工具,提供数据采集、清洗转换处理以及流批一体化计算功能;...

49
Emerging
135 weifuwan/seatunnel-web

SeaTunnel Web is a visual platform for building, managing, and monitoring...

49
Emerging
136 thadhutch/sports-quant

End-to-end NFL data pipeline that scrapes PFF grades and Pro Football...

49
Emerging
137 MilkMp/CIA-World-Factbooks-Archive-1990-2025

Complete structured archive of every CIA World Factbook edition from...

49
Emerging
138 irajhedayati/data-engineering

A set of Data Engineering tools online for public use

49
Emerging
139 SpareCores/sc-crawler

Pull and standardize data on cloud compute resources.

49
Emerging
140 koralium/flowtide

High-performance streaming SQL query engine designed for real-time data...

49
Emerging
141 scribe-org/Scribe-Server

Backend service for Scribe data downloads

48
Emerging
142 databricks-industry-solutions/python-data-sources

Quality python data sources for pyspark 4.x

48
Emerging
143 edrewitz/WxData

A Python library that acts as a client to download, pre-process and...

48
Emerging
144 catalyst-cooperative/ferc-xbrl-extractor

A tool for converting FERC filings published in XBRL into SQLite databases

48
Emerging
145 NeaByteLab/IDX-API

Indonesian Stock Exchange API wrapper for trading data integration.

48
Emerging
146 leftkats/awesome-greek-tech-jobs

A comprehensive map of companies that hire for tech jobs in Greece.

48
Emerging
147 MTSWebServices/onetl

One ETL tool to rule them all

48
Emerging
148 jordilin/gitar

Git all remotes. git cli tool that targets both Github and Gitlab

47
Emerging
149 apache/incubator-devlake-playground

Apache DevLake is an open-source dev data platform to ingest, analyze, and...

47
Emerging
150 dagster-io/dagster-open-platform

Dagster Labs' open-source data platform, built with Dagster.

47
Emerging
151 DawnbrandBots/yaml-yugi

A machine-readable, human-editable database of the Yu-Gi-Oh! Trading Card...

47
Emerging
152 Indexical-Metrics-Measure-Advisory/watchmen

Watchmen Platform is a low code data platform for data pipeline, meta data...

47
Emerging
153 elastiflow/pipelines

A lightweight Go framework for building stateful, real-time data pipelines....

47
Emerging
154 wilson-mok/demo

In this repository, you will find varies demo and presentations I have...

46
Emerging
155 monarch-initiative/koza

Data transformation framework for LinkML data models

46
Emerging
156 AbsaOSS/pramen

Resilient data pipeline framework running on Apache Spark

46
Emerging
157 opensnowcat/opensnowcat-collector

OpenSnowcat Collector, an open source fork of Snowplow (Apache 2.0 License)

46
Emerging
158 PeopleForBikes/brokenspoke

A collection of tools for the BNA.

46
Emerging
159 Data-Research-Analysis/data-research-analysis-platform

Stop Guessing. Start Dominating Your Market. The only data platform built...

46
Emerging
160 wherobots/airflow-providers-wherobots

Airflow extensions for communicating with Wherobots Cloud

46
Emerging
161 mikevan666/opendataworks

opendataworks...

46
Emerging
162 mattlianje/etl4s

Powerful, whiteboard-style ETL

46
Emerging
163 wp-labs/warp-parse

Focusing on building industry-leading ETL engines.

46
Emerging
164 colliery-io/cloacina

Embedded workflow orchestration library for Rust and Python. Build...

46
Emerging
165 DevDizzle/gammarips-engine

An end-to-end, serverless AI platform built on Google Cloud that...

45
Emerging
166 hiero-hackers/analytics

Stay up to date with hiero organisation activity and contributor diversity

45
Emerging
167 terrylica/exness-data-preprocess

Professional forex tick data preprocessing with unified DuckDB storage,...

45
Emerging
168 AndreaBozzo/Ceres

Harvesting & Semantic search for open data portals

45
Emerging
169 GregoryKogan/yt-framework

Build scalable data pipelines on YTsaurus with automatic stage management,...

45
Emerging
170 AMPATH/etl-rest-server

This project hosts scripts to generate flat tables used for reporting purposes.

45
Emerging
171 CategoricalData/CQL

Categorical Query Language IDE

45
Emerging
172 B1AAB/EBA

An ML-first temporal graph of Bitcoin's on-chain fund flows.

45
Emerging
173 MTSWebServices/syncmaster-ui

Frontend for Syncmaster, no-code ETL tool. WIP

44
Emerging
174 ineelhere/forex-connect

Streamlit Connection to Explore Foreign Currency Exchange rates 💰 in real-time

44
Emerging
175 netxs2000/devops

DevOps Data Application Platform...

44
Emerging
176 rannd1nt/phaethon

Dimensional Data Pipeline & Semantic Data Engineering Framework

44
Emerging
177 moj-analytical-services/etl_manager

A python package to create a database on the platform using our moj data...

44
Emerging
178 ludovicschmetz-stack/datavow

Open-source data contract enforcement — define, sync dbt, validate, block,...

44
Emerging
179 DataKitchen/dataops-observability

DataOps Observability is part of DataKitchen's Open Source Data...

44
Emerging
180 realdatadriven/etlx

ETL / ELT Framework powered by DuckDB, designed to seamlessly integrate and...

44
Emerging
181 yanghaiji/JsonCleanseETL

JSONCleanseETL是一款专业的数据清洗和转换工具,旨在为用户提供高效处理JSON格式数据的解决方案。...

44
Emerging
182 nationalarchives/ds-caselaw-ingester

Parse judgements from the Transformation Engine and load them into MarkLogic...

44
Emerging
183 trustedshops-public/schema2pyarrow

Converts AsyncApi and JsonSchema to PyArrow schema

43
Emerging
184 datacompose/datacompose

Data Cleaning for Pyspark

42
Emerging
185 DawnbrandBots/yaml-yugipedia

An automatically-updated collection of wikitexts from Yugipedia. Part of YAML Yugi.

42
Emerging
186 Beyond-Finance/dataeng-de-technical-assessment

Public repo of Beyond Finance's technical assessment for Data Engineering candidates

42
Emerging
187 ChrisDevRepo/vscode_data_lineage

VS Code extension for visualizing SQL Server database object dependencies...

42
Emerging
188 MTSWebServices/syncmaster

No-code ETL tool, based on onETL + PySpark

41
Emerging
189 ankiano/etl

Extract transform load CLI tool for extracting small and middle data volume...

41
Emerging
190 ccao-data/data-architecture

Codebase for CCAO data infrastructure construction and management

41
Emerging
191 chalk-ai/chalk-go

Go client for Chalk

41
Emerging
192 RustedBytes/audios-to-dataset

Convert your audio files into DuckDB or Parquet files

41
Emerging
193 Edwardvaneechoud/pyfloe

A minimal zero dependency dataframe library

41
Emerging
194 ivszhuravlev/spark-tuning-handbook

Hands-on Spark internals and performance engineering.

41
Emerging
195 markusbegerow/data-analytics-exercises

End-to-end data warehouse exercises for students - build a modern ELT...

41
Emerging
196 mahmoudparsian/data-warehousing

This repository is a place for the Data Warehousing course at the...

40
Emerging
197 DataKitchen/dataops-observability-agents

DataOps Observability Integration Agents are part of DataKitchen's Open...

40
Emerging
198 tenzir/library

Packages for the Tenzir ecosystem.

39
Emerging
199 nightmarewalker/D-MemFS

In-process virtual filesystem with hard quota for Python

39
Emerging
200 continuous-dems/fetchez

Fetchez is a lightweight, modular, and highly extendable Python framework...

39
Emerging
201 prefeitura-rio/pipelines_rj_smtr

Códigos de captura e tratamento de dados da SMTR

39
Emerging
202 exasol/exasol-personal

The High-Performance Analytics Engine — Free for Personal Use

39
Emerging
203 gopidesupavan/qualink

Data quality validation, profiling, anomaly detection, and YAML-driven...

38
Emerging
204 ottogroup/koality

Library for data quality monitoring based on duckdb.

38
Emerging
205 BEKO2210/World_report

A self-updating global dashboard that aggregates 40+ open data sources...

38
Emerging
206 rush-db/rushdb

RushDB is an Instant Database for Modern Apps & AI. Built on top of Neo4j.

38
Emerging
207 vedanthv/data-engineering-portfolio

Cool DE Projects

38
Emerging
208 mbari-org/aidata

(ETL) Extract, transform, load/download and augment images and annotations...

38
Emerging
209 jtakish/airflow-provider-sap-hana

Airflow provider package for SAP HANA

37
Emerging
210 bitroot/coflux

Open-source workflow engine. Orchestrate and observe computational workflows...

37
Emerging
211 cderickson/Mox-Data.com

Mox-Data.com is a cloud-based data ingestion tool used to process raw data...

36
Emerging
212 Zipstack/visitran

Modern, AI-native and agentic Pythonic data transformation platform.

36
Emerging
213 bruin-data/setup-bruin

Official action to install Bruin CLI in Github Actions.

36
Emerging
214 richban/opendata-stack-platform

Open Data Stack Platform: a collection of projects and pipelines built with...

35
Emerging
215 peter115342/soccer-tracker-DE-project

End-To-End Data Engineering Project. Made to learn some common data...

35
Emerging
216 equitusai/arcxa

Mapping intelligence for enterprise data migrations: schema mapping,...

35
Emerging
217 TJAdryan/astro_blog

This site uses the amazing Astro.build project. I added **Google Docs** ...

35
Emerging
218 MTSWebServices/horizon

Simple HWM Store backend

34
Emerging
219 moj-analytical-services/iam_builder

Little helper to write IAM policies

34
Emerging
220 SourceWatcher/source-watcher-core

PHP ETL engine with pluggable steps: extractors, transformers, loaders

34
Emerging
221 vnvo/deltaforge

A versatile, high-performance Change Data Capture (CDC) engine built in...

34
Emerging
222 tvs-sde/oxford-omop-data-mapper

A documentation-centric DuckDB based ETL tool, implementing transformations...

34
Emerging
223 IgorNatann/project_e_commerce_dw

DW de e-commerce (Kimball/Star Schema) em SQL Server, com scripts, dados...

33
Emerging
224 MTSWebServices/etl-entities

Basic ETL Entity classes for onETL

33
Emerging
225 illuin-tech/data-pipeline

Library for describing data transformation pipelines by compositing simple...

33
Emerging
226 tracebloc/data-ingestors

tracebloc data pipeline for training/test dataset setup

33
Emerging
227 sopho-tech/sopho

Open Source Business Intelligence

33
Emerging
228 eventvisor/eventvisor

Fine-grained control over analytics events and logs via remote configuration

33
Emerging
229 neo-technology-field/python-etl-lib

simple lib of ETL building blocks

33
Emerging
230 sul-dlss/libsys-airflow

Airflow DAGS for migrating and managing ILS data into FOLIO along with other...

33
Emerging
231 MTSWebServices/horizon-hwm-store

Horizon HWM Store for onETL

32
Emerging
232 qweliant/ankaa

POC for real-time monitoring and alert system for home hemodialysis,...

32
Emerging
233 zovchik0v/task-management

🛠️ Streamline task management with this full-stack solution featuring...

31
Emerging
234 lyrasis/kiba-extend

Extensions to Kiba ETL

30
Emerging
235 everycure-org/kedro-argo

argo-kedro is a kedro-plugin for executing Kedro pipelines on Argo Workflows.

30
Emerging
236 tarek-clarke/resilient-rap-framework

A resilient, fault‑tolerant telemetry analytics pipeline designed to...

30
Emerging
237 vishnuvardhanaan/equity-fundamental-engine

Production-style financial data engineering pipeline that standardizes NSE...

29
Experimental
238 lezwon/CatalystOps

Semantic cost-linting and performance warnings extension for Databricks in VS Code

29
Experimental
239 Hyperwindmill/morphql

Transform data with queries

29
Experimental
240 betoalien/PardoX

PardoX: The Hyper-Fast Data Engine

29
Experimental
241 calbergs/spotify-api

Pipeline that extracts data from the Spotify API to build a more detailed...

29
Experimental
242 adhamhaithameid/Classroom-Quick-Downloader

A sophisticated cross-browser extension for bulk Google Classroom downloads,...

29
Experimental
243 elevata-labs/elevata

elevata is an Architecture Runtime for modern data platforms —...

28
Experimental
244 faltz009/Closure-SDK

A hash you can do algebra on — composable verification for ordered data over...

27
Experimental
245 nvisycom/runtime

Enterprise-grade multimodal redaction runtime that detects and removes...

27
Experimental
246 nicopon/dtpipe

A simple, self-contained CLI for performance-focused data streaming & anonymization.

27
Experimental
247 Galaticos-API/API-3

Projeto da API do primeiro semestre de 2026

27
Experimental
248 vishnuvardhanaan/equity-fundamental-analytics

Macro-aware, explainable equity analytics system using Bronze–Silver–Gold...

27
Experimental
249 tbrus/smartjoin

Deterministic key and join discovery for structured datasets

27
Experimental
250 TheCocoTeam/source-watcher-core

PHP ETL engine for building extract–transform–load pipelines with pluggable...

27
Experimental
251 edwinweber/dbt_duckdb_demo_public

Data engineering demo project for Danish Parliament (Folketing) open data —...

26
Experimental
252 raphaelberly/journal

A movie journal coupled with open IMDb data, and a Flask web-app for easy...

26
Experimental
253 RaySatish/Market-Surveillance-System

Big-data pipeline detecting wash trading, pump & dump, and spoofing in trade...

26
Experimental
254 salimt/Transfermarkt-ETL-and-LIVE-Scores

asyncIO, Github Actions, GCP, dbt, Terraform, Docker

25
Experimental
255 pandabear-neil/microsoft_fabric_mods

Code Snippets, Designs, and other things about building a Data Analytics...

24
Experimental
256 tosh2230/stairlight

A data lineage tool detects table dependencies from rendered SQL statements.

23
Experimental
257 turki-alhumaid/8-week-sql-challenge-tsql

My Solutions to Danny Ma's 8 Week SQL Challenge — built in T-SQL on SQL Server

19
Experimental
258 arnienemeth/industry-intel-generator

Automated weekly tech trend reports — built with Claude Code + Claude Cowork

19
Experimental
259 turki-alajmi/8-Week-TSQL-Challenge

My Solutions to Danny Ma's 8 Week SQL Challenge — built in T-SQL on SQL Server

17
Experimental
260 turki-alajmi/8-week-sql-challenge-tsql

My Solutions to Danny Ma's 8 Week SQL Challenge — built in T-SQL on SQL Server

17
Experimental
261 anwitars/grab

High-performance, declarative stream processor for delimited text data.

17
Experimental