All Data Engineering Tools

1,297 tools ranked by quality score · Page 3 of 13

Showing 201–300 of 1,297
# Tool Score Tier
201 bitol-io/open-data-product-standard

Home of the Open Data Product Standard (ODPS).

51
Established
202 xxh/xxh-shell-xonsh

Use @xonsh wherever you go through the SSH without installation on the host.

51
Established
203 turbot/steampipe-plugin-cloudflare

Use SQL to instantly query accounts, zones and more from Cloudflare. Open...

51
Established
204 StructuredLabs/preswald

Preswald is a WASM packager for Python-based interactive data apps: bundle...

51
Established
205 aws-samples/uncovering-hidden-connections-in-unstructured-financial-data

Uncovering Hidden Connections in Unstructured Financial Data using Amazon...

51
Established
206 byzer-org/byzer-lang

Byzer (former MLSQL): A low-code open-source programming language for data...

51
Established
207 turbot/steampipe-plugin-googleworkspace

Use SQL to instantly query calendar events, drive files, gmail messages, and...

51
Established
208 edanalytics/earthmover

CLI tool for transforming collections of tabular source data into a variety...

51
Established
209 turbot/steampipe-plugin-net

Use SQL to instantly query DNS records, certificates and other network...

51
Established
210 MLT-OSS/FirstData

The World's Most Comprehensive, Authoritative, and Structured Open Source...

51
Established
211 wgzhao/addax-admin

Addax Admin is a web-based management console for Addax ETL jobs, offering...

51
Established
212 ogbinar/DataEngineeringPilipinas

Data Engineering Pilipinas is a community for data engineers, data analysts,...

50
Established
213 ashish10alex/vscode-dataform-tools

Dataform Tools - VS Code extension to run and visualise Dataform data...

50
Established
214 build-on-aws/rag-postgresql-agent-bedrock

This application is built in four stages using infrastructure as code with...

50
Established
215 robert-koch-institut/mex-common

RKI Metadata Exchange | Software development toolkit for the MEx project...

50
Established
216 mehd-io/pypi-duck-flow

end-to-end data engineering project to get insights from PyPi using python,...

50
Established
217 turbot/steampipe-plugin-stripe

Use SQL to instantly query customers, products, invoices and more from...

50
Established
218 hbz/lobid-resources

Transformation, web frontend, and API for the hbz catalog as LOD

50
Established
219 JuliaML/TableTransforms.jl

Transforms and pipelines with tabular data in Julia

50
Established
220 GoPlasmatic/dataflow-rs

A high-performance rules engine for IFTTT-style automation in Rust with...

50
Established
221 turbot/steampipe-plugin-zendesk

Use SQL to instantly query Zendesk. Open source CLI. No DB required.

50
Established
222 turbot/steampipe-plugin-oci

Use SQL to instantly query Oracle Cloud resources across regions and...

50
Established
223 turbot/steampipe-plugin-datadog

Use SQL to instantly query Datadog resources across accounts. Open source...

50
Established
224 turbot/steampipe-plugin-salesforce

Use SQL to instantly query Salesforce resources. Open source CLI. No DB required.

50
Established
225 dtmirizzi/target-elasticsearch

A Meltano target for Elasticsearch

50
Established
226 pplu/aws-sdk-perl

A community AWS SDK for Perl Programmers

50
Established
227 turbot/steampipe-plugin-prometheus

Use SQL to instantly query Prometheus metrics, alerts, labels and more. Open...

50
Established
228 GovHub-br/data-application-gov-hub

Pipeline de Dados do Gov-Hub

50
Established
229 bbossgroups/bboss-elastic-tran

bboss-datatran 由 bboss 开源的数据采集&流批一体化工具,提供数据采集、清洗转换处理以及流批一体化计算功能;...

49
Emerging
230 bradfitz/embiggen-disk

embiggden-disk live-resizes a filesystem after first live-resizing any...

49
Emerging
231 bmeares/Meerschaum

Create and manage data pipes with Meerschaum.

49
Emerging
232 weifuwan/seatunnel-web

SeaTunnel Web is a visual platform for building, managing, and monitoring...

49
Emerging
233 turbot/steampipe-plugin-terraform

Use SQL to instantly query resources, data sources and more from Terraform...

49
Emerging
234 turbot/steampipe-plugin-okta

Use SQL to instantly query users, groups, applications and more from Okta....

49
Emerging
235 thadhutch/sports-quant

End-to-end NFL data pipeline that scrapes PFF grades and Pro Football...

49
Emerging
236 hi-primus/optimus

:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF,...

49
Emerging
237 turbot/steampipe-export

Steampipe Export is a zero-ETL CLI to fetch data from cloud services and...

49
Emerging
238 MilkMp/CIA-World-Factbooks-Archive-1990-2025

Complete structured archive of every CIA World Factbook edition from...

49
Emerging
239 irajhedayati/data-engineering

A set of Data Engineering tools online for public use

49
Emerging
240 turbot/steampipe-plugin-microsoft365

Use SQL to instantly query calendars, contacts, drives, mailboxes and more...

49
Emerging
241 DataBora/elusion

DataFrame / Data Engineering Library with familiar syntax like ones we love:...

49
Emerging
242 SpareCores/sc-crawler

Pull and standardize data on cloud compute resources.

49
Emerging
243 Pipelex/pipelex-cookbook

Cookbook for Pipelex, the declarative language for composable Al workflows....

49
Emerging
244 mc2-project/opaque-sql

An encrypted data analytics platform

49
Emerging
245 turbot/steampipe-plugin-csv

Use SQL to instantly query data from CSV files. Open source CLI. No DB required.

49
Emerging
246 capitalone/DataProfiler

What's in your data? Extract schema, statistics and entities from datasets

49
Emerging
247 koralium/flowtide

High-performance streaming SQL query engine designed for real-time data...

49
Emerging
248 aartikis/RTEC

RTEC is an Event Calculus implementation optimised for stream reasoning

48
Emerging
249 turbot/steampipe-plugin-rss

Use SQL to instantly query RSS channels and Atom Feeds. Open source CLI. No...

48
Emerging
250 scribe-org/Scribe-Server

Backend service for Scribe data downloads

48
Emerging
251 myriade-ai/myriade

AI Native Data Platform: explore, clean, transform and govern your data...

48
Emerging
252 turbot/steampipe-plugin-config

Use SQL to instantly query data from various types of config files. Open...

48
Emerging
253 databricks-industry-solutions/python-data-sources

Quality python data sources for pyspark 4.x

48
Emerging
254 turbot/steampipe-plugin-jenkins

Use SQL to instantly query Jenkins resources. Open source CLI. No DB required.

48
Emerging
255 turbot/steampipe-plugin-mastodon

Use SQL to instantly query Mastodon resources. Open source CLI. No DB required.

48
Emerging
256 turbot/steampipe-plugin-reddit

Use SQL to instantly query Reddit posts, comments & more. Open source CLI....

48
Emerging
257 edrewitz/WxData

A Python library that acts as a client to download, pre-process and...

48
Emerging
258 catalyst-cooperative/ferc-xbrl-extractor

A tool for converting FERC filings published in XBRL into SQLite databases

48
Emerging
259 SETL-Framework/setl

A simple Spark-powered ETL framework that just works 🍺

48
Emerging
260 NeaByteLab/IDX-API

Indonesian Stock Exchange API wrapper for trading data integration.

48
Emerging
261 turbot/steampipe-plugin-shodan

Use SQL to instantly query host, DNS and exploit information using Shodan....

48
Emerging
262 tuanx18/data-engineer-portfolio

This is a repository to demonstrate my details, skills, projects and to keep...

48
Emerging
263 turbot/steampipe-sqlite

Steampipe SQLite is a zero-ETL engine for SQLite. Virtual tables translate...

48
Emerging
264 sql-machine-learning/sqlflow

Brings SQL and AI together.

48
Emerging
265 FalkorDB/falkordb-ts

FalkorDB Typescript Client

48
Emerging
266 leftkats/awesome-greek-tech-jobs

A comprehensive map of companies that hire for tech jobs in Greece.

48
Emerging
267 MTSWebServices/onetl

One ETL tool to rule them all

48
Emerging
268 feathr-ai/feathr

Feathr – A scalable, unified data and AI engineering platform for enterprise

47
Emerging
269 turbot/steampipe-plugin-openai

Use SQL to instantly query OpenAI for completions, models & more. Open...

47
Emerging
270 turbot/steampipe-plugin-hypothesis

Use SQL to instantly query Hypothesis resources. Open source CLI. No DB required.

47
Emerging
271 jordilin/gitar

Git all remotes. git cli tool that targets both Github and Gitlab

47
Emerging
272 apache/incubator-devlake-playground

Apache DevLake is an open-source dev data platform to ingest, analyze, and...

47
Emerging
273 turbot/steampipe-plugin-circleci

Use SQL to instantly query projects, pipelines, builds and more from...

47
Emerging
274 snowplow/dbt-snowplow-normalize

A dbt package to support modelling event data via split tables for use in...

47
Emerging
275 libredb/libredb-studio

A modern, blazing-fast SQL IDE for the cloud era. Query PostgreSQL, MySQL,...

47
Emerging
276 joonan-lab/cwas

Category-wide association study (CWAS) (Werling et al., 2018; An et al., 2018)

47
Emerging
277 dagster-io/dagster-open-platform

Dagster Labs' open-source data platform, built with Dagster.

47
Emerging
278 abdubakr77/deepcsv

Automatically processes data files in directories, converts array-like...

47
Emerging
279 J0SAL/Decentralized-Expense-Tracker

Tracking Expenses Securely

47
Emerging
280 DawnbrandBots/yaml-yugi

A machine-readable, human-editable database of the Yu-Gi-Oh! Trading Card...

47
Emerging
281 buildersoftio/cortex

Cortex | Data Framework—a cutting-edge SDK that simplifies real-time data...

47
Emerging
282 turbot/steampipe-plugin-bitbucket

Use SQL to instantly query Bitbucket. Open source CLI. No DB required.

47
Emerging
283 Indexical-Metrics-Measure-Advisory/watchmen

Watchmen Platform is a low code data platform for data pipeline, meta data...

47
Emerging
284 elastiflow/pipelines

A lightweight Go framework for building stateful, real-time data pipelines....

47
Emerging
285 turbot/steampipe-plugin-wiz

Use SQL to instantly query Wiz resources. Open source CLI. No DB required.

46
Emerging
286 turbot/steampipe-plugin-pagerduty

Use SQL to instantly query resources from PagerDuty. Open source CLI. No DB required.

46
Emerging
287 wilson-mok/demo

In this repository, you will find varies demo and presentations I have...

46
Emerging
288 turbot/steampipe-plugin-crowdstrike

Use SQL to instantly query CrowdStrike resources. Open source CLI. No DB required.

46
Emerging
289 monarch-initiative/koza

Data transformation framework for LinkML data models

46
Emerging
290 AbsaOSS/pramen

Resilient data pipeline framework running on Apache Spark

46
Emerging
291 opensnowcat/opensnowcat-collector

OpenSnowcat Collector, an open source fork of Snowplow (Apache 2.0 License)

46
Emerging
292 MLD3/FIDDLE

FlexIble Data-Driven pipeLinE – a preprocessing pipeline that transforms...

46
Emerging
293 PeopleForBikes/brokenspoke

A collection of tools for the BNA.

46
Emerging
294 Data-Research-Analysis/data-research-analysis-platform

Stop Guessing. Start Dominating Your Market. The only data platform built...

46
Emerging
295 alexei-led/spotinfo

CLI for exploring AWS EC2 Spot inventory. Inspect AWS Spot instance types,...

46
Emerging
296 wherobots/airflow-providers-wherobots

Airflow extensions for communicating with Wherobots Cloud

46
Emerging
297 BlazingDB/blazingsql

BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built...

46
Emerging
298 mikevan666/opendataworks

opendataworks...

46
Emerging
299 mattlianje/etl4s

Powerful, whiteboard-style ETL

46
Emerging
300 turbot/steampipe-plugin-exec

Use SQL to instantly query & run shell commands on local & remote servers....

46
Emerging
« Prev 1 2 3 4 5 11 12 13 Next »