All Data Engineering Tools
1,297 tools ranked by quality score
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
PrefectHQ/prefect
Prefect is a workflow orchestration framework for building resilient data... |
|
Verified |
| 2 |
growthbook/growthbook
Open Source Feature Flags, Experimentation, and Product Analytics |
|
Verified |
| 3 |
koopjs/koop
Transform, query, and download geospatial data on the web. |
|
Verified |
| 4 |
pathwaycom/pathway
Python ETL framework for stream processing, real-time analytics, LLM... |
|
Verified |
| 5 |
dagster-io/dagster
An orchestration platform for the development, production, and observation... |
|
Verified |
| 6 |
supabase/supabase-py
Python Client for Supabase. Query Postgres from Flask, Django, FastAPI.... |
|
Verified |
| 7 |
dlt-hub/dlt
data load tool (dlt) is an open source Python library that makes data... |
|
Verified |
| 8 |
meltano/meltano
Meltano: the declarative code-first data integration engine that powers your... |
|
Verified |
| 9 |
capitalone/locopy
locopy: Loading/Unloading to Redshift and Snowflake using Python. |
|
Verified |
| 10 |
Unstructured-IO/unstructured
Convert documents to structured data effortlessly. Unstructured is... |
|
Verified |
| 11 |
apache/hop
Hop Orchestration Platform |
|
Verified |
| 12 |
apache/superset
Apache Superset is a Data Visualization and Data Exploration Platform |
|
Verified |
| 13 |
airbytehq/airbyte
The leading data integration platform for ETL / ELT data pipelines from... |
|
Verified |
| 14 |
pyjanitor-devs/pyjanitor
Clean APIs for data cleaning. Python implementation of R package Janitor |
|
Verified |
| 15 |
apache/shardingsphere
Empowering Data Intelligence with Distributed SQL for Sharding, Scalability,... |
|
Verified |
| 16 |
catalyst-cooperative/pudl
The Public Utility Data Liberation Project provides analysis-ready energy... |
|
Verified |
| 17 |
debezium/debezium
Change data capture for a variety of databases. Please log issues at... |
|
Verified |
| 18 |
quiltdata/quilt
Quilt is a Scientific Data Management Platform on AWS that helps teams and... |
|
Verified |
| 19 |
bruin-data/ingestr
ingestr is a CLI tool to copy data between any databases with a single... |
|
Verified |
| 20 |
apache/incubator-devlake
Apache DevLake is an open-source dev data platform to ingest, analyze, and... |
|
Verified |
| 21 |
databricks/dbt-databricks
A dbt adapter for Databricks. |
|
Verified |
| 22 |
odpi/egeria
Egeria core |
|
Verified |
| 23 |
apache/flink-cdc
Flink CDC is a streaming data integration tool |
|
Verified |
| 24 |
thorsten/phpMyFAQ
phpMyFAQ - Open Source FAQ web application for PHP 8.3+ and MySQL,... |
|
Verified |
| 25 |
steedos/steedos-platform
The AI-Native Infrastructure for Enterprise Apps. Powered by ObjectStack... |
|
Verified |
| 26 |
apache/seatunnel
SeaTunnel is a multimodal, high-performance, distributed, massive data... |
|
Verified |
| 27 |
dathere/qsv
Blazing-fast Data-Wrangling toolkit |
|
Verified |
| 28 |
open-metadata/OpenMetadata
OpenMetadata is a unified metadata platform for data discovery, data... |
|
Verified |
| 29 |
datazip-inc/olake
OLake - Fastest Databases, Kafka & S3 Replication to Apache Iceberg or Plain... |
|
Verified |
| 30 |
nordquant/complete-dbt-bootcamp-zero-to-hero
Supplementary Materials for the The Complete dbt (Data Build Tool) Bootcamp... |
|
Verified |
| 31 |
nightscape/spark-excel
A Spark plugin for reading and writing Excel files |
|
Verified |
| 32 |
datavane/tis
Support agile DataOps Based on Flink, DataX and Flink-CDC, Chunjun with Web-UI |
|
Verified |
| 33 |
vectordotdev/vector
A high-performance observability data pipeline. |
|
Verified |
| 34 |
ariacom/Seal-Report
Database Reporting Tool and Tasks (.Net) |
|
Verified |
| 35 |
dataform-co/dataform
Dataform is a framework for managing SQL based data operations in BigQuery |
|
Verified |
| 36 |
datavane/datavines
Know your data better!Datavines is Next-gen Data Observability Platform,... |
|
Verified |
| 37 |
elastic/eland
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL... |
|
Verified |
| 38 |
wgzhao/Addax
A fast and versatile ETL tool that can transfer data between RDBMS and NoSQL... |
|
Verified |
| 39 |
sodadata/soda-core
Data Contracts engine for the modern data stack. https://www.soda.io |
|
Verified |
| 40 |
crate/crate
CrateDB is a distributed and scalable SQL database for storing and analyzing... |
|
Verified |
| 41 |
cloudquery/cloudquery
Data pipelines for cloud config and security data. Build cloud asset... |
|
Verified |
| 42 |
dagucloud/dagu
A local-first workflow engine built the way it should be: declarative,... |
|
Verified |
| 43 |
risingwavelabs/risingwave
Event streaming platform for agents, apps, and analytics. Continuously... |
|
Verified |
| 44 |
dagu-org/dagu
A local-first workflow engine built the way it should be: declarative,... |
|
Verified |
| 45 |
dbeaver/dbeaver
Free universal database tool and SQL client |
|
Verified |
| 46 |
aws/aws-sdk-pandas
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream,... |
|
Verified |
| 47 |
datajoint/datajoint-python
Relational data pipelines for the science lab |
|
Established |
| 48 |
snowplow/snowplow
The leader in Customer Data Infrastructure |
|
Established |
| 49 |
PeerDB-io/peerdb
Fast, Simple and a cost effective tool to replicate data from Postgres to... |
|
Established |
| 50 |
treeverse/lakeFS
lakeFS - Data version control for your data lake | Git for data |
|
Established |
| 51 |
knime/knime-core
KNIME Analytics Platform |
|
Established |
| 52 |
mage-ai/mage-ai
🧙 Build, run, and manage data pipelines for integrating and transforming data. |
|
Established |
| 53 |
apecloud/ape-dts
ApeCloud's Data Transfer Suite, written in Rust. Provides ultra-fast data... |
|
Established |
| 54 |
iBridges-for-iRODS/iBridges
A wrapper around the python-irodsclient to allow for easy interaction with... |
|
Established |
| 55 |
SQLMesh/sqlmesh
Scalable and efficient data transformation framework - backwards compatible with dbt. |
|
Established |
| 56 |
amphi-ai/amphi-etl
visual data prep powered by python |
|
Established |
| 57 |
scribe-org/Scribe-Data
Wikidata and Wiktionary language data extraction |
|
Established |
| 58 |
vietvudanh/vietlott-data
Automation fetching data for Vietlott. Just for fun. |
|
Established |
| 59 |
datacleaner/DataCleaner
The premier open source Data Quality solution |
|
Established |
| 60 |
elementary-data/elementary
The dbt-native data observability solution for data & analytics engineers.... |
|
Established |
| 61 |
networktocode/diffsync
A utility library for comparing and synchronizing different datasets. |
|
Established |
| 62 |
rusq/slackdump
Save or export your private and public Slack messages, threads, files, and... |
|
Established |
| 63 |
dotflow-io/dotflow
🎲 Business Logic Code in a flow! |
|
Established |
| 64 |
mayneyao/eidos
An extensible framework for Personal Data Management. |
|
Established |
| 65 |
biglocalnews/warn-transformer
Consolidate, enrich and republish the data gathered by warn-scraper |
|
Established |
| 66 |
vaexio/vaex
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML,... |
|
Established |
| 67 |
slingdata-io/sling-cli
Sling is a CLI tool that extracts data from a source storage/database and... |
|
Established |
| 68 |
apache/hamilton
Apache Hamilton helps data scientists and engineers define testable,... |
|
Established |
| 69 |
timeplus-io/proton
⚡ Fastest SQL ETL pipeline in a single C++ binary, built for stream... |
|
Established |
| 70 |
snowflakedb/snowpark-python
Snowflake Snowpark Python API |
|
Established |
| 71 |
risesoft-y9/DataFlow-Engine
数据流引擎是一款面向数据集成、数据同步、数据交换、数据共享、任务配置、任务调度的底层数据驱动引擎。数据流引擎采用管执分离、多流层、插件库等体系应对大规模数... |
|
Established |
| 72 |
biglocalnews/warn-scraper
Command-line interface for downloading WARN Act notices of qualified plant... |
|
Established |
| 73 |
VisActor/VStory
Use data to tell stories.An intelligent Visualization Narrative Development... |
|
Established |
| 74 |
fedspendingtransparency/usaspending-api
Server application to serve U.S. federal spending data via a RESTful API |
|
Established |
| 75 |
fugue-project/fugue
A unified interface for distributed computing. Fugue executes SQL, Python,... |
|
Established |
| 76 |
Desbordante/desbordante-core
Desbordante is a high-performance data profiler that is capable of... |
|
Established |
| 77 |
laminlabs/lamindb
Open-source data framework for biology. Context and memory for datasets and... |
|
Established |
| 78 |
bitpicky/dbt-sugar
dbt-sugar is a CLI tool that allows users of dbt to have fun and ease... |
|
Established |
| 79 |
datagouv/csv-detective
Inspection of tabular (csv, xls-like) files to guess the columns' content |
|
Established |
| 80 |
osalvador/ReplicaDB
ReplicaDB is open source tool for database replication, designed for... |
|
Established |
| 81 |
ConduitIO/conduit
Conduit streams data between data stores. Kafka Connect replacement. No JVM required. |
|
Established |
| 82 |
redpanda-data/connect
Fancy stream processing made operationally mundane |
|
Established |
| 83 |
insitro/redun
Yet another redundant workflow engine |
|
Established |
| 84 |
ohs-foundation/fhir-data-pipes
A collection of tools for extracting FHIR resources and analytics services... |
|
Established |
| 85 |
turbot/steampipe-plugin-aws
Use SQL to instantly query AWS resources across regions and accounts. Open... |
|
Established |
| 86 |
data-engineering-community/data-engineering-wiki
The best place to learn data engineering. Built and maintained by the data... |
|
Established |
| 87 |
TianLangStudio/DataXServer
为DataX(https://github.com/alibaba/DataX) 提供远程多语言调用(ThriftServer,HttpServer)... |
|
Established |
| 88 |
Multiwoven/multiwoven
🔥🔥🔥 Open source Reverse ETL - alternative to hightouch and census. |
|
Established |
| 89 |
fkie-cad/Logprep
log data pre processing, generation and shipping in python |
|
Established |
| 90 |
edkreuk/FMD_FRAMEWORK
The Fabric Metadata-Driven Framework (FMD) is a cutting-edge accelerator... |
|
Established |
| 91 |
opendatadiscovery/odd-platform
First open-source data discovery and observability platform. We make a life... |
|
Established |
| 92 |
astronomer/airflow-provider-fivetran-async
A new Airflow Provider for Fivetran, maintained by Astronomer and Fivetran |
|
Established |
| 93 |
cre-dev/xml2db
A Python package to load complex XML files into a relational database |
|
Established |
| 94 |
stn1slv/awesome-integration
A curated list of awesome system integration software and resources. |
|
Established |
| 95 |
airbytehq/PyAirbyte
PyAirbyte brings the power of Airbyte to every Python developer. |
|
Established |
| 96 |
HariSekhon/SQL-scripts
100+ SQL Scripts - PostgreSQL, MySQL, Oracle, Google BigQuery, MariaDB, AWS... |
|
Established |
| 97 |
sparklyr/sparklyr
R interface for Apache Spark |
|
Established |
| 98 |
hyparam/icebird
Icebird: JavaScript Iceberg Client |
|
Established |
| 99 |
neo4j/neo4j-jdbc
Official Neo4j JDBC Driver |
|
Established |
| 100 |
dlt-hub/verified-sources
Contribute to dlt verified sources 🔥 |
|
Established |