Trending Data Engineering Tools
Tools with the biggest quality score improvements over the last 14 days.
| # | Tool | Change | Score | Tier |
|---|---|---|---|---|
| 1 |
aws/aws-sdk-pandas
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream,... |
+11 | 70 | Verified |
| 2 |
sodadata/soda-core
Data Contracts engine for the modern data stack. https://www.soda.io |
+10 | 70 | Verified |
| 3 |
amphi-ai/amphi-etl
visual data prep powered by python |
+10 | 67 | Established |
| 4 |
koopjs/koop
Transform, query, and download geospatial data on the web. |
+10 | 89 | Verified |
| 5 |
dotflow-io/dotflow
🎲 Business Logic Code in a flow! |
+8 | 65 | Established |
| 6 |
reductstore/reductstore
High Performance Storage and Streaming Solution for Data Acquisition Systems |
+8 | 56 | Established |
| 7 |
quixio/quix-streams
Python Streaming DataFrames for Kafka |
+8 | 60 | Established |
| 8 |
turbot/steampipe-plugin-aws
Use SQL to instantly query AWS resources across regions and accounts. Open... |
+7 | 63 | Established |
| 9 |
data-engineering-community/data-engineering-wiki
The best place to learn data engineering. Built and maintained by the data... |
+7 | 63 | Established |
| 10 |
apache/hop
Hop Orchestration Platform |
+7 | 76 | Verified |
| 11 |
turbot/steampipe-plugin-github
Use SQL to instantly query repositories, users, gists and more from GitHub.... |
+7 | 59 | Established |
| 12 |
wgzhao/Addax
A fast and versatile ETL tool that can transfer data between RDBMS and NoSQL... |
+7 | 71 | Verified |
| 13 |
opensnowcat/opensnowcat-enrich
OpenSnowcat Enricher (Apache 2.0 License) |
+7 | 53 | Established |
| 14 |
wilson-mok/demo
In this repository, you will find varies demo and presentations I have... |
+7 | 46 | Emerging |
| 15 |
pandabear-neil/microsoft_fabric_mods
Code Snippets, Designs, and other things about building a Data Analytics... |
+7 | 24 | Experimental |
| 16 |
salimt/Transfermarkt-ETL-and-LIVE-Scores
asyncIO, Github Actions, GCP, dbt, Terraform, Docker |
+7 | 25 | Experimental |
| 17 |
dagster-io/community-integrations
Community supported integrations for the Dagster platform. |
+7 | 58 | Established |
| 18 |
ccao-data/data-architecture
Codebase for CCAO data infrastructure construction and management |
+7 | 41 | Emerging |
| 19 |
tosh2230/stairlight
A data lineage tool detects table dependencies from rendered SQL statements. |
+7 | 23 | Experimental |
| 20 |
RustedBytes/audios-to-dataset
Convert your audio files into DuckDB or Parquet files |
+7 | 41 | Emerging |
| 21 |
Data-Research-Analysis/data-research-analysis-platform
Stop Guessing. Start Dominating Your Market. The only data platform built... |
+7 | 46 | Emerging |
| 22 |
jtakish/airflow-provider-sap-hana
Airflow provider package for SAP HANA |
+7 | 37 | Emerging |
| 23 |
odpi/egeria-docs
Documentation repository for the Egeria project. |
+7 | 55 | Established |
| 24 |
catalyst-cooperative/pudl
The Public Utility Data Liberation Project provides analysis-ready energy... |
+7 | 76 | Verified |
| 25 |
fkie-cad/Logprep
log data pre processing, generation and shipping in python |
+7 | 62 | Established |
| 26 |
datavane/tis
Support agile DataOps Based on Flink, DataX and Flink-CDC, Chunjun with Web-UI |
+7 | 71 | Verified |
| 27 |
dataform-co/dataform
Dataform is a framework for managing SQL based data operations in BigQuery |
+7 | 71 | Verified |
| 28 |
apecloud/ape-dts
ApeCloud's Data Transfer Suite, written in Rust. Provides ultra-fast data... |
+7 | 68 | Established |
| 29 |
DataSQRL/sqrl
Data Pipeline Automation Framework to build MCP servers, data APIs, and data... |
+7 | 55 | Established |
| 30 |
GovHub-br/gov-hub
GovHub - Transformando Dados em Valor para Gestão Pública |
+7 | 53 | Established |
| 31 |
SQLMesh/sqlmesh
Scalable and efficient data transformation framework - backwards compatible with dbt. |
+7 | 67 | Established |
| 32 |
digitalghost-dev/poke-cli
A hybrid CLI/TUI tool written in Go for viewing Pokémon data from the... |
+7 | 42 | Emerging |
| 33 |
peter115342/soccer-tracker-DE-project
End-To-End Data Engineering Project. Made to learn some common data... |
+7 | 35 | Emerging |
| 34 |
CategoricalData/CQL
Categorical Query Language IDE |
+7 | 45 | Emerging |
| 35 |
mehd-io/pypi-duck-flow
end-to-end data engineering project to get insights from PyPi using python,... |
+7 | 50 | Established |
| 36 |
ankiano/etl
Extract transform load CLI tool for extracting small and middle data volume... |
+7 | 41 | Emerging |
| 37 |
Edwardvaneechoud/Flowfile
Flowfile is a visual ETL tool and Python library combining drag-and-drop... |
+7 | 52 | Established |
| 38 |
rudderlabs/rudder-server
Privacy and Security focused Segment-alternative, in Golang and React |
+7 | 58 | Established |
| 39 |
Multiwoven/multiwoven
🔥🔥🔥 Open source Reverse ETL - alternative to hightouch and census. |
+7 | 62 | Established |
| 40 |
Indexical-Metrics-Measure-Advisory/watchmen
Watchmen Platform is a low code data platform for data pipeline, meta data... |
+7 | 47 | Emerging |
| 41 |
dalenewman/Transformalize
Configurable Extract, Transform, and Load |
+7 | 59 | Established |
| 42 |
tracebloc/data-ingestors
tracebloc data pipeline for training/test dataset setup |
+7 | 33 | Emerging |
| 43 |
flowsynx/flowsynx
A deterministic orchestrator for composable micro-workflows with reusable modules |
+7 | 56 | Established |
| 44 |
datacleaner/DataCleaner
The premier open source Data Quality solution |
+7 | 67 | Established |
| 45 |
illuin-tech/data-pipeline
Library for describing data transformation pipelines by compositing simple... |
+7 | 33 | Emerging |
| 46 |
MTSWebServices/onetl
One ETL tool to rule them all |
+7 | 48 | Emerging |
| 47 |
vedanthv/data-engineering-portfolio
Cool DE Projects |
+7 | 38 | Emerging |
| 48 |
odpi/egeria
Egeria core |
+7 | 74 | Verified |
| 49 |
ConduitIO/conduit
Conduit streams data between data stores. Kafka Connect replacement. No JVM required. |
+7 | 63 | Established |
| 50 |
DawnbrandBots/yaml-yugi
A machine-readable, human-editable database of the Yu-Gi-Oh! Trading Card... |
+7 | 47 | Emerging |
| 51 |
cderickson/Mox-Data.com
Mox-Data.com is a cloud-based data ingestion tool used to process raw data... |
+7 | 36 | Emerging |
| 52 |
datazip-inc/olake-ui
Frontend & BFF (Backend for frontend) for Olake. This includes the UI code... |
+7 | 57 | Established |
| 53 |
bbossgroups/bboss-elastic-tran
bboss-datatran 由 bboss 开源的数据采集&流批一体化工具,提供数据采集、清洗转换处理以及流批一体化计算功能;... |
+7 | 49 | Emerging |
| 54 |
moj-analytical-services/etl_manager
A python package to create a database on the platform using our moj data... |
+7 | 44 | Emerging |
| 55 |
wgzhao/addax-admin
Addax Admin is a web-based management console for Addax ETL jobs, offering... |
+7 | 51 | Established |
| 56 |
catalyst-cooperative/ferc-xbrl-extractor
A tool for converting FERC filings published in XBRL into SQLite databases |
+7 | 48 | Emerging |
| 57 |
SpareCores/sc-crawler
Pull and standardize data on cloud compute resources. |
+7 | 49 | Emerging |
| 58 |
hbz/lobid-resources
Transformation, web frontend, and API for the hbz catalog as LOD |
+7 | 50 | Established |
| 59 |
stn1slv/awesome-integration
A curated list of awesome system integration software and resources. |
+7 | 62 | Established |
| 60 |
raphaelberly/journal
A movie journal coupled with open IMDb data, and a Flask web-app for easy... |
+7 | 26 | Experimental |
| 61 |
MTSWebServices/syncmaster
No-code ETL tool, based on onETL + PySpark |
+7 | 41 | Emerging |
| 62 |
osalvador/ReplicaDB
ReplicaDB is open source tool for database replication, designed for... |
+7 | 63 | Established |
| 63 |
AbsaOSS/pramen
Resilient data pipeline framework running on Apache Spark |
+7 | 46 | Emerging |
| 64 |
HTTP-RPC/Kilo
Lightweight REST for Java |
+7 | 60 | Established |
| 65 |
starlake-ai/starlake
Declarative text based tool for data analysts and engineers to extract,... |
+7 | 56 | Established |
| 66 |
jordilin/gitar
Git all remotes. git cli tool that targets both Github and Gitlab |
+7 | 47 | Emerging |
| 67 |
DawnbrandBots/yaml-yugipedia
An automatically-updated collection of wikitexts from Yugipedia. Part of YAML Yugi. |
+7 | 42 | Emerging |
| 68 |
apache/incubator-devlake-playground
Apache DevLake is an open-source dev data platform to ingest, analyze, and... |
+7 | 47 | Emerging |
| 69 |
MTSWebServices/horizon
Simple HWM Store backend |
+7 | 34 | Emerging |
| 70 |
prefeitura-rio/pipelines_rj_smtr
Códigos de captura e tratamento de dados da SMTR |
+7 | 39 | Emerging |
| 71 |
sul-dlss/libsys-airflow
Airflow DAGS for migrating and managing ILS data into FOLIO along with other... |
+7 | 33 | Emerging |
| 72 |
MTSWebServices/etl-entities
Basic ETL Entity classes for onETL |
+7 | 33 | Emerging |
| 73 |
tenzir/library
Packages for the Tenzir ecosystem. |
+7 | 39 | Emerging |
| 74 |
MTSWebServices/horizon-hwm-store
Horizon HWM Store for onETL |
+7 | 32 | Emerging |
| 75 |
MTSWebServices/syncmaster-ui
Frontend for Syncmaster, no-code ETL tool. WIP |
+7 | 44 | Emerging |
| 76 |
PeopleForBikes/brokenspoke
A collection of tools for the BNA. |
+7 | 46 | Emerging |
| 77 |
nationalarchives/ds-caselaw-ingester
Parse judgements from the Transformation Engine and load them into MarkLogic... |
+7 | 44 | Emerging |
| 78 |
ohs-foundation/fhir-data-pipes
A collection of tools for extracting FHIR resources and analytics services... |
+7 | 63 | Established |
| 79 |
Breeze0806/go-etl
go-etl is a toolset for data extraction, transformation and loading. |
+7 | 61 | Established |
| 80 |
opendatadiscovery/odd-platform
First open-source data discovery and observability platform. We make a life... |
+7 | 62 | Established |
| 81 |
bitol-io/open-data-contract-standard
Home of the Open Data Contract Standard (ODCS). |
+7 | 60 | Established |
| 82 |
Desbordante/desbordante-core
Desbordante is a high-performance data profiler that is capable of... |
+7 | 63 | Established |
| 83 |
dagster-io/dagster-open-platform
Dagster Labs' open-source data platform, built with Dagster. |
+7 | 47 | Emerging |
| 84 |
DataRecce/recce
The data-validation toolkit for enhanced dbt (data build tool) PR review |
+7 | 52 | Established |
| 85 |
snowflakedb/snowpark-python
Snowflake Snowpark Python API |
+7 | 64 | Established |
| 86 |
airbytehq/PyAirbyte
PyAirbyte brings the power of Airbyte to every Python developer. |
+7 | 62 | Established |
| 87 |
halestudio/hale
(Spatial) data harmonisation with hale»studio (formerly HUMBOLDT Alignment Editor) |
+7 | 52 | Established |
| 88 |
kanton-bern/hellodata-be
The Open-Source Enterprise Data Platform in a single Portal |
+7 | 54 | Established |
| 89 |
chalk-ai/chalk-go
Go client for Chalk |
+7 | 41 | Emerging |
| 90 |
DataKitchen/dataops-observability-agents
DataOps Observability Integration Agents are part of DataKitchen's Open... |
+7 | 40 | Emerging |
| 91 |
dfpc-coe/CloudTAK
TAK Compatible, browser based Common Operation Picture & Situational Awareness tool |
+7 | 57 | Established |
| 92 |
DataKitchen/data-observability-installer
Installer for DataKitchen's Open Source Data Observability Products. Data... |
+7 | 52 | Established |
| 93 |
dlt-hub/verified-sources
Contribute to dlt verified sources 🔥 |
+7 | 61 | Established |
| 94 |
ashish10alex/vscode-dataform-tools
Dataform Tools - VS Code extension to run and visualise Dataform data... |
+7 | 50 | Established |
| 95 |
fedspendingtransparency/usaspending-api
Server application to serve U.S. federal spending data via a RESTful API |
+7 | 64 | Established |
| 96 |
dataflint/spark
Drop-in replacement for Apache Spark UI |
+7 | 57 | Established |
| 97 |
xxh/xxh-shell-xonsh
Use @xonsh wherever you go through the SSH without installation on the host. |
+7 | 51 | Established |
| 98 |
biglocalnews/warn-scraper
Command-line interface for downloading WARN Act notices of qualified plant... |
+7 | 64 | Established |
| 99 |
realdatadriven/etlx
ETL / ELT Framework powered by DuckDB, designed to seamlessly integrate and... |
+7 | 44 | Emerging |
| 100 |
Bruno-Furtado/cloud-cnpj
Ingestão, preparação e disponibilização gratuita de dados de CNPJs de... |
+7 | 52 | Established |