All Data Engineering Tools
1,297 tools ranked by quality score · Page 2 of 13
| # | Tool | Score | Tier |
|---|---|---|---|
| 101 |
apache/wayang
Apache Wayang is the first cross-platform data processing system. |
|
Established |
| 102 |
Breeze0806/go-etl
go-etl is a toolset for data extraction, transformation and loading. |
|
Established |
| 103 |
DataKitchen/dataops-testgen
DataOps Data Quality TestGen is part of DataKitchen's Open Source Data... |
|
Established |
| 104 |
quixio/quix-streams
Python Streaming DataFrames for Kafka |
|
Established |
| 105 |
langchain-ai/langchain-postgres
LangChain abstractions backed by Postgres Backend |
|
Established |
| 106 |
HTTP-RPC/Kilo
Lightweight REST for Java |
|
Established |
| 107 |
cnstlungu/portable-data-stack-dagster
A portable Datamart and Business Intelligence suite built with Docker,... |
|
Established |
| 108 |
bitol-io/open-data-contract-standard
Home of the Open Data Contract Standard (ODCS). |
|
Established |
| 109 |
jtablesaw/tablesaw
Java dataframe and visualization library |
|
Established |
| 110 |
turbot/steampipe-plugin-github
Use SQL to instantly query repositories, users, gists and more from GitHub.... |
|
Established |
| 111 |
linkedpipes/etl
LinkedPipes ETL is an RDF based, lightweight ETL tool |
|
Established |
| 112 |
vmware/versatile-data-kit
One framework to develop, deploy and operate data workflows with Python and SQL. |
|
Established |
| 113 |
bacalhau-project/bacalhau
Community-driven, simple, yet powerful framework for fast, cost-effective... |
|
Established |
| 114 |
KipData/KiteSQL
Fast. Embedded. Rust-native SQL database. |
|
Established |
| 115 |
metafacture/metafacture-core
Core package of the Metafacture tool suite for metadata processing. |
|
Established |
| 116 |
RumbleDB/rumble
Quick start: pip install jsoniq ⛈️ RumbleDB 2.0.0 "Lemon Ironwood" 🌳 for... |
|
Established |
| 117 |
AbsaOSS/cobrix
A COBOL parser and Mainframe/EBCDIC data source for Apache Spark |
|
Established |
| 118 |
dalenewman/Transformalize
Configurable Extract, Transform, and Load |
|
Established |
| 119 |
dotnet/spark
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers. |
|
Established |
| 120 |
ucbepic/docetl
A system for agentic LLM-powered data processing and ETL |
|
Established |
| 121 |
vibhorkum/pg_background
Production-grade PostgreSQL extension to execute arbitrary SQL in background... |
|
Established |
| 122 |
turbot/steampipe
Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No... |
|
Established |
| 123 |
DataTalksClub/data-engineering-zoomcamp
Data Engineering Zoomcamp is a free 9-week course on building... |
|
Established |
| 124 |
cparmet/pandas-checks
🐼🩺 Pandas Checks: Non-invasive health checks for Pandas method chains |
|
Established |
| 125 |
rudderlabs/rudder-server
Privacy and Security focused Segment-alternative, in Golang and React |
|
Established |
| 126 |
uptake/uptasticsearch
An Elasticsearch client tailored to data science workflows. |
|
Established |
| 127 |
dashmug/glue-utils
glue-utils makes AWS Glue jobs less repetitive, more type-safe, and easier... |
|
Established |
| 128 |
Data-Centric-AI-Community/ydata-profiling
1 Line of code data quality profiling & exploratory data analysis for Pandas... |
|
Established |
| 129 |
xorq-labs/xorq
A compute manifest and composable tools for data, built on Ibis, DataFusion,... |
|
Established |
| 130 |
snowplow/enrich
Snowplow Enrichment jobs and library |
|
Established |
| 131 |
dagster-io/community-integrations
Community supported integrations for the Dagster platform. |
|
Established |
| 132 |
9tigerio/db2rest
Instant no code DATA API platform for relational databases. Connect any... |
|
Established |
| 133 |
h2oai/sparkling-water
Sparkling Water provides H2O functionality inside Spark cluster |
|
Established |
| 134 |
dfpc-coe/CloudTAK
TAK Compatible, browser based Common Operation Picture & Situational Awareness tool |
|
Established |
| 135 |
datazip-inc/olake-ui
Frontend & BFF (Backend for frontend) for Olake. This includes the UI code... |
|
Established |
| 136 |
nshiab/simple-data-analysis
Easy-to-use and high-performance TypeScript library for data analysis. Works... |
|
Established |
| 137 |
turbot/steampipe-plugin-kubernetes
Use SQL to instantly query Kubernetes API resources. Open source CLI. No DB required. |
|
Established |
| 138 |
benjamin-awd/monopoly
Monopoly is a Python library & CLI that converts bank statement PDFs to CSV. |
|
Established |
| 139 |
evinism/mistql
A query / expression language for performing computations on JSON-like... |
|
Established |
| 140 |
turbot/steampipe-plugin-gcp
Use SQL to instantly query GCP resources across regions, projects and... |
|
Established |
| 141 |
dataflint/spark
Drop-in replacement for Apache Spark UI |
|
Established |
| 142 |
debba/tabularis
A lightweight, developer-focused database management tool. Supports MySQL,... |
|
Established |
| 143 |
turbot/steampipe-plugin-azure
Use SQL to instantly query Azure resources across regions and subscriptions.... |
|
Established |
| 144 |
CogStack/CogStack-NiFi
Building data processing pipelines for documents processing with NLP using... |
|
Established |
| 145 |
snowplow/dbt-snowplow-web
A fully incremental model, that transforms raw web event data generated by... |
|
Established |
| 146 |
ICIJ/extract
A cross-platform command line tool for parallelised content extraction and analysis. |
|
Established |
| 147 |
alibaba/feathub
FeatHub - A stream-batch unified feature store for real-time machine learning |
|
Established |
| 148 |
starlake-ai/starlake
Declarative text based tool for data analysts and engineers to extract,... |
|
Established |
| 149 |
flowsynx/flowsynx
A deterministic orchestrator for composable micro-workflows with reusable modules |
|
Established |
| 150 |
mozilla/python_mozetl
ETL jobs for Firefox Telemetry |
|
Established |
| 151 |
techascent/tech.ml.dataset
A Clojure high performance data processing system |
|
Established |
| 152 |
reductstore/reductstore
High Performance Storage and Streaming Solution for Data Acquisition Systems |
|
Established |
| 153 |
DataSQRL/sqrl
Data Pipeline Automation Framework to build MCP servers, data APIs, and data... |
|
Established |
| 154 |
nodestream-proj/nodestream
A Declarative framework for Building, Maintaining, and Analyzing Graph Data |
|
Established |
| 155 |
odpi/egeria-docs
Documentation repository for the Egeria project. |
|
Established |
| 156 |
kay-ou/SimTradeData
SimTradeData is a utility library supporting SimTradeDesk, SimTradeLab and... |
|
Established |
| 157 |
Guepard-Corp/qwery-core
The Boring query platform - Connect and query anything |
|
Established |
| 158 |
Snowflake-Labs/emerging-solutions-toolbox
The Emerging Solutions Toolbox is a collection of solutions created by... |
|
Established |
| 159 |
turbot/steampipe-plugin-sdk
Steampipe Plugin SDK is a simple abstraction layer to write a Steampipe... |
|
Established |
| 160 |
docwire/docwire
DocWire SDK: Award-winning modern data processing in C++20. SourceForge... |
|
Established |
| 161 |
OHDSI/ETL-Synthea
A package supporting the conversion from Synthea CSV to OMOP CDM |
|
Established |
| 162 |
cnstlungu/portable-data-stack-mage
A portable Datamart and Business Intelligence suite built with Docker, Mage,... |
|
Established |
| 163 |
microsoft/unified-data-foundation-with-fabric-solution-accelerator
Unified Data Foundation with Microsoft Fabric with Options to Integrate with... |
|
Established |
| 164 |
apache/doris-kafka-connector
Kafka Connector for Apache Doris |
|
Established |
| 165 |
airyhq/airy
💬 Open Source App Framework to build streaming apps with real-time data - 💎... |
|
Established |
| 166 |
turbot/steampipe-plugin-jira
Use SQL to instantly query Jira. Open source CLI. No DB required. |
|
Established |
| 167 |
dflib/dflib
In-memory Java DataFrame library |
|
Established |
| 168 |
akmalsoliev/Validoopsie
A simple and easy to use Data Validation library for Python. |
|
Established |
| 169 |
heavyai/heavydb
HeavyDB (formerly MapD/OmniSciDB) |
|
Established |
| 170 |
tower/tower-cli
Next generation compute platform for the post-modern data stack |
|
Established |
| 171 |
kanton-bern/hellodata-be
The Open-Source Enterprise Data Platform in a single Portal |
|
Established |
| 172 |
cnstlungu/portable-data-stack-airflow
A portable Datamart and Business Intelligence suite built with Docker,... |
|
Established |
| 173 |
bytewax/bytewax
Python Stream Processing |
|
Established |
| 174 |
GovHub-br/gov-hub
GovHub - Transformando Dados em Valor para Gestão Pública |
|
Established |
| 175 |
rpsft/etlbox
A lightweight ETL (extract, transform, load) library and data integration... |
|
Established |
| 176 |
elyra-ai/pipeline-editor
Common pipeline-editor components used in different clients (e.g. Elyra... |
|
Established |
| 177 |
mprove-io/mprove
Open Source Business Intelligence with Malloy Semantic Layer :tada: |
|
Established |
| 178 |
dbt-labs/jaffle-shop
🥪🦘 An open source sandbox project exploring dbt workflows via a fictional... |
|
Established |
| 179 |
opensnowcat/opensnowcat-enrich
OpenSnowcat Enricher (Apache 2.0 License) |
|
Established |
| 180 |
GitBrincie212/ChronoGrapher
Powerful, developer-experience centric, blazingly fast and extensible job... |
|
Established |
| 181 |
caiopizzol/cnpj-data-pipeline
Pipeline open-source que baixa e processa os dados da Receita Federal para PostgreSQL |
|
Established |
| 182 |
spitfireuptown/datalinkx
🔥🔥DatalinkX异构数据源之间的数据同步系统,支持海量数据的增量或全量同步,同时支持HTTP、Oracle、MySQL、ES等数据源之间的数据流转,... |
|
Established |
| 183 |
SentryPeer/SentryPeer
Protect your SIP Servers from bad actors at https://sentrypeer.org |
|
Established |
| 184 |
arkflow-rs/arkflow
High performance Rust stream processing engine seamlessly integrates AI... |
|
Established |
| 185 |
kalininalab/DataSAIL
DataSAIL is a tool to split datasets while reducing information leakage. |
|
Established |
| 186 |
zazuko/barnard59
An intuitive and flexible RDF pipeline solution designed to simplify and... |
|
Established |
| 187 |
treeverse/charts
Helm charts |
|
Established |
| 188 |
turbot/steampipe-plugin-slack
Use SQL to instantly query users, channels, emoji and more from your Slack... |
|
Established |
| 189 |
fdmorison/tiozin
Tiozin, your friendly ETL framework |
|
Established |
| 190 |
evdubs/oic-options-chains
ETL for OIC Options Chains |
|
Established |
| 191 |
Edwardvaneechoud/Flowfile
Flowfile is a visual ETL tool and Python library combining drag-and-drop... |
|
Established |
| 192 |
Bruno-Furtado/cloud-cnpj
Ingestão, preparação e disponibilização gratuita de dados de CNPJs de... |
|
Established |
| 193 |
PFund-Software-Ltd/pfeed
Data Engine for Manual/Algo Trading: Download/Stream -> Clean -> Store.... |
|
Established |
| 194 |
AndreaBozzo/dataprof
Library and CLI for profiling tabular data |
|
Established |
| 195 |
lakevision-project/lakevision
Lakevision is a tool which provides insights into your Apache Iceberg based... |
|
Established |
| 196 |
halestudio/hale
(Spatial) data harmonisation with hale»studio (formerly HUMBOLDT Alignment Editor) |
|
Established |
| 197 |
DataRecce/recce
The data-validation toolkit for enhanced dbt (data build tool) PR review |
|
Established |
| 198 |
ara3d/bim-open-schema
Representing BIM Data as Parquet |
|
Established |
| 199 |
DataKitchen/data-observability-installer
Installer for DataKitchen's Open Source Data Observability Products. Data... |
|
Established |
| 200 |
turbot/steampipe-plugin-azuread
Use SQL to instantly query groups, service principals, users and more from... |
|
Established |