All Data Engineering Tools

1,297 tools ranked by quality score

Showing 1–100 of 1,297
# Tool Score Tier
1 PrefectHQ/prefect

Prefect is a workflow orchestration framework for building resilient data...

95
Verified
2 growthbook/growthbook

Open Source Feature Flags, Experimentation, and Product Analytics

90
Verified
3 koopjs/koop

Transform, query, and download geospatial data on the web.

89
Verified
4 pathwaycom/pathway

Python ETL framework for stream processing, real-time analytics, LLM...

85
Verified
5 dagster-io/dagster

An orchestration platform for the development, production, and observation...

84
Verified
6 supabase/supabase-py

Python Client for Supabase. Query Postgres from Flask, Django, FastAPI....

81
Verified
7 dlt-hub/dlt

data load tool (dlt) is an open source Python library that makes data...

80
Verified
8 meltano/meltano

Meltano: the declarative code-first data integration engine that powers your...

79
Verified
9 capitalone/locopy

locopy: Loading/Unloading to Redshift and Snowflake using Python.

79
Verified
10 Unstructured-IO/unstructured

Convert documents to structured data effortlessly. Unstructured is...

79
Verified
11 apache/hop

Hop Orchestration Platform

76
Verified
12 apache/superset

Apache Superset is a Data Visualization and Data Exploration Platform

76
Verified
13 airbytehq/airbyte

The leading data integration platform for ETL / ELT data pipelines from...

76
Verified
14 pyjanitor-devs/pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor

76
Verified
15 apache/shardingsphere

Empowering Data Intelligence with Distributed SQL for Sharding, Scalability,...

76
Verified
16 catalyst-cooperative/pudl

The Public Utility Data Liberation Project provides analysis-ready energy...

76
Verified
17 debezium/debezium

Change data capture for a variety of databases. Please log issues at...

76
Verified
18 quiltdata/quilt

Quilt is a Scientific Data Management Platform on AWS that helps teams and...

75
Verified
19 bruin-data/ingestr

ingestr is a CLI tool to copy data between any databases with a single...

74
Verified
20 apache/incubator-devlake

Apache DevLake is an open-source dev data platform to ingest, analyze, and...

74
Verified
21 databricks/dbt-databricks

A dbt adapter for Databricks.

74
Verified
22 odpi/egeria

Egeria core

74
Verified
23 apache/flink-cdc

Flink CDC is a streaming data integration tool

74
Verified
24 thorsten/phpMyFAQ

phpMyFAQ - Open Source FAQ web application for PHP 8.3+ and MySQL,...

73
Verified
25 steedos/steedos-platform

The AI-Native Infrastructure for Enterprise Apps. Powered by ObjectStack...

73
Verified
26 apache/seatunnel

SeaTunnel is a multimodal, high-performance, distributed, massive data...

73
Verified
27 dathere/qsv

Blazing-fast Data-Wrangling toolkit

73
Verified
28 open-metadata/OpenMetadata

OpenMetadata is a unified metadata platform for data discovery, data...

72
Verified
29 datazip-inc/olake

OLake - Fastest Databases, Kafka & S3 Replication to Apache Iceberg or Plain...

72
Verified
30 nordquant/complete-dbt-bootcamp-zero-to-hero

Supplementary Materials for the The Complete dbt (Data Build Tool) Bootcamp...

71
Verified
31 nightscape/spark-excel

A Spark plugin for reading and writing Excel files

71
Verified
32 datavane/tis

Support agile DataOps Based on Flink, DataX and Flink-CDC, Chunjun with Web-UI

71
Verified
33 vectordotdev/vector

A high-performance observability data pipeline.

71
Verified
34 ariacom/Seal-Report

Database Reporting Tool and Tasks (.Net)

71
Verified
35 dataform-co/dataform

Dataform is a framework for managing SQL based data operations in BigQuery

71
Verified
36 datavane/datavines

Know your data better!Datavines is Next-gen Data Observability Platform,...

71
Verified
37 elastic/eland

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL...

71
Verified
38 wgzhao/Addax

A fast and versatile ETL tool that can transfer data between RDBMS and NoSQL...

71
Verified
39 sodadata/soda-core

Data Contracts engine for the modern data stack. https://www.soda.io

70
Verified
40 crate/crate

CrateDB is a distributed and scalable SQL database for storing and analyzing...

70
Verified
41 cloudquery/cloudquery

Data pipelines for cloud config and security data. Build cloud asset...

70
Verified
42 dagucloud/dagu

A local-first workflow engine built the way it should be: declarative,...

70
Verified
43 risingwavelabs/risingwave

Event streaming platform for agents, apps, and analytics. Continuously...

70
Verified
44 dagu-org/dagu

A local-first workflow engine built the way it should be: declarative,...

70
Verified
45 dbeaver/dbeaver

Free universal database tool and SQL client

70
Verified
46 aws/aws-sdk-pandas

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream,...

70
Verified
47 datajoint/datajoint-python

Relational data pipelines for the science lab

69
Established
48 snowplow/snowplow

The leader in Customer Data Infrastructure

69
Established
49 PeerDB-io/peerdb

Fast, Simple and a cost effective tool to replicate data from Postgres to...

69
Established
50 treeverse/lakeFS

lakeFS - Data version control for your data lake | Git for data

68
Established
51 knime/knime-core

KNIME Analytics Platform

68
Established
52 mage-ai/mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

68
Established
53 apecloud/ape-dts

ApeCloud's Data Transfer Suite, written in Rust. Provides ultra-fast data...

68
Established
54 iBridges-for-iRODS/iBridges

A wrapper around the python-irodsclient to allow for easy interaction with...

67
Established
55 SQLMesh/sqlmesh

Scalable and efficient data transformation framework - backwards compatible with dbt.

67
Established
56 amphi-ai/amphi-etl

visual data prep powered by python

67
Established
57 scribe-org/Scribe-Data

Wikidata and Wiktionary language data extraction

67
Established
58 vietvudanh/vietlott-data

Automation fetching data for Vietlott. Just for fun.

67
Established
59 datacleaner/DataCleaner

The premier open source Data Quality solution

67
Established
60 elementary-data/elementary

The dbt-native data observability solution for data & analytics engineers....

66
Established
61 networktocode/diffsync

A utility library for comparing and synchronizing different datasets.

66
Established
62 rusq/slackdump

Save or export your private and public Slack messages, threads, files, and...

66
Established
63 dotflow-io/dotflow

🎲 Business Logic Code in a flow!

65
Established
64 mayneyao/eidos

An extensible framework for Personal Data Management.

65
Established
65 biglocalnews/warn-transformer

Consolidate, enrich and republish the data gathered by warn-scraper

65
Established
66 vaexio/vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML,...

65
Established
67 slingdata-io/sling-cli

Sling is a CLI tool that extracts data from a source storage/database and...

65
Established
68 apache/hamilton

Apache Hamilton helps data scientists and engineers define testable,...

65
Established
69 timeplus-io/proton

⚡ Fastest SQL ETL pipeline in a single C++ binary, built for stream...

65
Established
70 snowflakedb/snowpark-python

Snowflake Snowpark Python API

64
Established
71 risesoft-y9/DataFlow-Engine

数据流引擎是一款面向数据集成、数据同步、数据交换、数据共享、任务配置、任务调度的底层数据驱动引擎。数据流引擎采用管执分离、多流层、插件库等体系应对大规模数...

64
Established
72 biglocalnews/warn-scraper

Command-line interface for downloading WARN Act notices of qualified plant...

64
Established
73 VisActor/VStory

Use data to tell stories.An intelligent Visualization Narrative Development...

64
Established
74 fedspendingtransparency/usaspending-api

Server application to serve U.S. federal spending data via a RESTful API

64
Established
75 fugue-project/fugue

A unified interface for distributed computing. Fugue executes SQL, Python,...

64
Established
76 Desbordante/desbordante-core

Desbordante is a high-performance data profiler that is capable of...

63
Established
77 laminlabs/lamindb

Open-source data framework for biology. Context and memory for datasets and...

63
Established
78 bitpicky/dbt-sugar

dbt-sugar is a CLI tool that allows users of dbt to have fun and ease...

63
Established
79 datagouv/csv-detective

Inspection of tabular (csv, xls-like) files to guess the columns' content

63
Established
80 osalvador/ReplicaDB

ReplicaDB is open source tool for database replication, designed for...

63
Established
81 ConduitIO/conduit

Conduit streams data between data stores. Kafka Connect replacement. No JVM required.

63
Established
82 redpanda-data/connect

Fancy stream processing made operationally mundane

63
Established
83 insitro/redun

Yet another redundant workflow engine

63
Established
84 ohs-foundation/fhir-data-pipes

A collection of tools for extracting FHIR resources and analytics services...

63
Established
85 turbot/steampipe-plugin-aws

Use SQL to instantly query AWS resources across regions and accounts. Open...

63
Established
86 data-engineering-community/data-engineering-wiki

The best place to learn data engineering. Built and maintained by the data...

63
Established
87 TianLangStudio/DataXServer

为DataX(https://github.com/alibaba/DataX) 提供远程多语言调用(ThriftServer,HttpServer)...

62
Established
88 Multiwoven/multiwoven

🔥🔥🔥 Open source Reverse ETL - alternative to hightouch and census.

62
Established
89 fkie-cad/Logprep

log data pre processing, generation and shipping in python

62
Established
90 edkreuk/FMD_FRAMEWORK

The Fabric Metadata-Driven Framework (FMD) is a cutting-edge accelerator...

62
Established
91 opendatadiscovery/odd-platform

First open-source data discovery and observability platform. We make a life...

62
Established
92 astronomer/airflow-provider-fivetran-async

A new Airflow Provider for Fivetran, maintained by Astronomer and Fivetran

62
Established
93 cre-dev/xml2db

A Python package to load complex XML files into a relational database

62
Established
94 stn1slv/awesome-integration

A curated list of awesome system integration software and resources.

62
Established
95 airbytehq/PyAirbyte

PyAirbyte brings the power of Airbyte to every Python developer.

62
Established
96 HariSekhon/SQL-scripts

100+ SQL Scripts - PostgreSQL, MySQL, Oracle, Google BigQuery, MariaDB, AWS...

61
Established
97 sparklyr/sparklyr

R interface for Apache Spark

61
Established
98 hyparam/icebird

Icebird: JavaScript Iceberg Client

61
Established
99 neo4j/neo4j-jdbc

Official Neo4j JDBC Driver

61
Established
100 dlt-hub/verified-sources

Contribute to dlt verified sources 🔥

61
Established
1 2 3 11 12 13 Next »