Trending Data Engineering Tools

Tools with the biggest quality score improvements over the last 14 days.

# Tool Change Score Tier
1 aws/aws-sdk-pandas

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream,...

+11 70 Verified
2 sodadata/soda-core

Data Contracts engine for the modern data stack. https://www.soda.io

+10 70 Verified
3 amphi-ai/amphi-etl

visual data prep powered by python

+10 67 Established
4 koopjs/koop

Transform, query, and download geospatial data on the web.

+10 89 Verified
5 dotflow-io/dotflow

🎲 Business Logic Code in a flow!

+8 65 Established
6 reductstore/reductstore

High Performance Storage and Streaming Solution for Data Acquisition Systems

+8 56 Established
7 quixio/quix-streams

Python Streaming DataFrames for Kafka

+8 60 Established
8 turbot/steampipe-plugin-aws

Use SQL to instantly query AWS resources across regions and accounts. Open...

+7 63 Established
9 data-engineering-community/data-engineering-wiki

The best place to learn data engineering. Built and maintained by the data...

+7 63 Established
10 apache/hop

Hop Orchestration Platform

+7 76 Verified
11 turbot/steampipe-plugin-github

Use SQL to instantly query repositories, users, gists and more from GitHub....

+7 59 Established
12 wgzhao/Addax

A fast and versatile ETL tool that can transfer data between RDBMS and NoSQL...

+7 71 Verified
13 opensnowcat/opensnowcat-enrich

OpenSnowcat Enricher (Apache 2.0 License)

+7 53 Established
14 wilson-mok/demo

In this repository, you will find varies demo and presentations I have...

+7 46 Emerging
15 pandabear-neil/microsoft_fabric_mods

Code Snippets, Designs, and other things about building a Data Analytics...

+7 24 Experimental
16 salimt/Transfermarkt-ETL-and-LIVE-Scores

asyncIO, Github Actions, GCP, dbt, Terraform, Docker

+7 25 Experimental
17 dagster-io/community-integrations

Community supported integrations for the Dagster platform.

+7 58 Established
18 ccao-data/data-architecture

Codebase for CCAO data infrastructure construction and management

+7 41 Emerging
19 tosh2230/stairlight

A data lineage tool detects table dependencies from rendered SQL statements.

+7 23 Experimental
20 RustedBytes/audios-to-dataset

Convert your audio files into DuckDB or Parquet files

+7 41 Emerging
21 Data-Research-Analysis/data-research-analysis-platform

Stop Guessing. Start Dominating Your Market. The only data platform built...

+7 46 Emerging
22 jtakish/airflow-provider-sap-hana

Airflow provider package for SAP HANA

+7 37 Emerging
23 odpi/egeria-docs

Documentation repository for the Egeria project.

+7 55 Established
24 catalyst-cooperative/pudl

The Public Utility Data Liberation Project provides analysis-ready energy...

+7 76 Verified
25 fkie-cad/Logprep

log data pre processing, generation and shipping in python

+7 62 Established
26 datavane/tis

Support agile DataOps Based on Flink, DataX and Flink-CDC, Chunjun with Web-UI

+7 71 Verified
27 dataform-co/dataform

Dataform is a framework for managing SQL based data operations in BigQuery

+7 71 Verified
28 apecloud/ape-dts

ApeCloud's Data Transfer Suite, written in Rust. Provides ultra-fast data...

+7 68 Established
29 DataSQRL/sqrl

Data Pipeline Automation Framework to build MCP servers, data APIs, and data...

+7 55 Established
30 GovHub-br/gov-hub

GovHub - Transformando Dados em Valor para Gestão Pública

+7 53 Established
31 SQLMesh/sqlmesh

Scalable and efficient data transformation framework - backwards compatible with dbt.

+7 67 Established
32 digitalghost-dev/poke-cli

A hybrid CLI/TUI tool written in Go for viewing Pokémon data from the...

+7 42 Emerging
33 peter115342/soccer-tracker-DE-project

End-To-End Data Engineering Project. Made to learn some common data...

+7 35 Emerging
34 CategoricalData/CQL

Categorical Query Language IDE

+7 45 Emerging
35 mehd-io/pypi-duck-flow

end-to-end data engineering project to get insights from PyPi using python,...

+7 50 Established
36 ankiano/etl

Extract transform load CLI tool for extracting small and middle data volume...

+7 41 Emerging
37 Edwardvaneechoud/Flowfile

Flowfile is a visual ETL tool and Python library combining drag-and-drop...

+7 52 Established
38 rudderlabs/rudder-server

Privacy and Security focused Segment-alternative, in Golang and React

+7 58 Established
39 Multiwoven/multiwoven

🔥🔥🔥 Open source Reverse ETL - alternative to hightouch and census.

+7 62 Established
40 Indexical-Metrics-Measure-Advisory/watchmen

Watchmen Platform is a low code data platform for data pipeline, meta data...

+7 47 Emerging
41 dalenewman/Transformalize

Configurable Extract, Transform, and Load

+7 59 Established
42 tracebloc/data-ingestors

tracebloc data pipeline for training/test dataset setup

+7 33 Emerging
43 flowsynx/flowsynx

A deterministic orchestrator for composable micro-workflows with reusable modules

+7 56 Established
44 datacleaner/DataCleaner

The premier open source Data Quality solution

+7 67 Established
45 illuin-tech/data-pipeline

Library for describing data transformation pipelines by compositing simple...

+7 33 Emerging
46 MTSWebServices/onetl

One ETL tool to rule them all

+7 48 Emerging
47 vedanthv/data-engineering-portfolio

Cool DE Projects

+7 38 Emerging
48 odpi/egeria

Egeria core

+7 74 Verified
49 ConduitIO/conduit

Conduit streams data between data stores. Kafka Connect replacement. No JVM required.

+7 63 Established
50 DawnbrandBots/yaml-yugi

A machine-readable, human-editable database of the Yu-Gi-Oh! Trading Card...

+7 47 Emerging
51 cderickson/Mox-Data.com

Mox-Data.com is a cloud-based data ingestion tool used to process raw data...

+7 36 Emerging
52 datazip-inc/olake-ui

Frontend & BFF (Backend for frontend) for Olake. This includes the UI code...

+7 57 Established
53 bbossgroups/bboss-elastic-tran

bboss-datatran 由 bboss 开源的数据采集&流批一体化工具,提供数据采集、清洗转换处理以及流批一体化计算功能;...

+7 49 Emerging
54 moj-analytical-services/etl_manager

A python package to create a database on the platform using our moj data...

+7 44 Emerging
55 wgzhao/addax-admin

Addax Admin is a web-based management console for Addax ETL jobs, offering...

+7 51 Established
56 catalyst-cooperative/ferc-xbrl-extractor

A tool for converting FERC filings published in XBRL into SQLite databases

+7 48 Emerging
57 SpareCores/sc-crawler

Pull and standardize data on cloud compute resources.

+7 49 Emerging
58 hbz/lobid-resources

Transformation, web frontend, and API for the hbz catalog as LOD

+7 50 Established
59 stn1slv/awesome-integration

A curated list of awesome system integration software and resources.

+7 62 Established
60 raphaelberly/journal

A movie journal coupled with open IMDb data, and a Flask web-app for easy...

+7 26 Experimental
61 MTSWebServices/syncmaster

No-code ETL tool, based on onETL + PySpark

+7 41 Emerging
62 osalvador/ReplicaDB

ReplicaDB is open source tool for database replication, designed for...

+7 63 Established
63 AbsaOSS/pramen

Resilient data pipeline framework running on Apache Spark

+7 46 Emerging
64 HTTP-RPC/Kilo

Lightweight REST for Java

+7 60 Established
65 starlake-ai/starlake

Declarative text based tool for data analysts and engineers to extract,...

+7 56 Established
66 jordilin/gitar

Git all remotes. git cli tool that targets both Github and Gitlab

+7 47 Emerging
67 DawnbrandBots/yaml-yugipedia

An automatically-updated collection of wikitexts from Yugipedia. Part of YAML Yugi.

+7 42 Emerging
68 apache/incubator-devlake-playground

Apache DevLake is an open-source dev data platform to ingest, analyze, and...

+7 47 Emerging
69 MTSWebServices/horizon

Simple HWM Store backend

+7 34 Emerging
70 prefeitura-rio/pipelines_rj_smtr

Códigos de captura e tratamento de dados da SMTR

+7 39 Emerging
71 sul-dlss/libsys-airflow

Airflow DAGS for migrating and managing ILS data into FOLIO along with other...

+7 33 Emerging
72 MTSWebServices/etl-entities

Basic ETL Entity classes for onETL

+7 33 Emerging
73 tenzir/library

Packages for the Tenzir ecosystem.

+7 39 Emerging
74 MTSWebServices/horizon-hwm-store

Horizon HWM Store for onETL

+7 32 Emerging
75 MTSWebServices/syncmaster-ui

Frontend for Syncmaster, no-code ETL tool. WIP

+7 44 Emerging
76 PeopleForBikes/brokenspoke

A collection of tools for the BNA.

+7 46 Emerging
77 nationalarchives/ds-caselaw-ingester

Parse judgements from the Transformation Engine and load them into MarkLogic...

+7 44 Emerging
78 ohs-foundation/fhir-data-pipes

A collection of tools for extracting FHIR resources and analytics services...

+7 63 Established
79 Breeze0806/go-etl

go-etl is a toolset for data extraction, transformation and loading.

+7 61 Established
80 opendatadiscovery/odd-platform

First open-source data discovery and observability platform. We make a life...

+7 62 Established
81 bitol-io/open-data-contract-standard

Home of the Open Data Contract Standard (ODCS).

+7 60 Established
82 Desbordante/desbordante-core

Desbordante is a high-performance data profiler that is capable of...

+7 63 Established
83 dagster-io/dagster-open-platform

Dagster Labs' open-source data platform, built with Dagster.

+7 47 Emerging
84 DataRecce/recce

The data-validation toolkit for enhanced dbt (data build tool) PR review

+7 52 Established
85 snowflakedb/snowpark-python

Snowflake Snowpark Python API

+7 64 Established
86 airbytehq/PyAirbyte

PyAirbyte brings the power of Airbyte to every Python developer.

+7 62 Established
87 halestudio/hale

(Spatial) data harmonisation with hale»studio (formerly HUMBOLDT Alignment Editor)

+7 52 Established
88 kanton-bern/hellodata-be

The Open-Source Enterprise Data Platform in a single Portal

+7 54 Established
89 chalk-ai/chalk-go

Go client for Chalk

+7 41 Emerging
90 DataKitchen/dataops-observability-agents

DataOps Observability Integration Agents are part of DataKitchen's Open...

+7 40 Emerging
91 dfpc-coe/CloudTAK

TAK Compatible, browser based Common Operation Picture & Situational Awareness tool

+7 57 Established
92 DataKitchen/data-observability-installer

Installer for DataKitchen's Open Source Data Observability Products. Data...

+7 52 Established
93 dlt-hub/verified-sources

Contribute to dlt verified sources 🔥

+7 61 Established
94 ashish10alex/vscode-dataform-tools

Dataform Tools - VS Code extension to run and visualise Dataform data...

+7 50 Established
95 fedspendingtransparency/usaspending-api

Server application to serve U.S. federal spending data via a RESTful API

+7 64 Established
96 dataflint/spark

Drop-in replacement for Apache Spark UI

+7 57 Established
97 xxh/xxh-shell-xonsh

Use @xonsh wherever you go through the SSH without installation on the host.

+7 51 Established
98 biglocalnews/warn-scraper

Command-line interface for downloading WARN Act notices of qualified plant...

+7 64 Established
99 realdatadriven/etlx

ETL / ELT Framework powered by DuckDB, designed to seamlessly integrate and...

+7 44 Emerging
100 Bruno-Furtado/cloud-cnpj

Ingestão, preparação e disponibilização gratuita de dados de CNPJs de...

+7 52 Established