All Data Engineering Tools
1,297 tools ranked by quality score · Page 3 of 13
| # | Tool | Score | Tier |
|---|---|---|---|
| 201 |
bitol-io/open-data-product-standard
Home of the Open Data Product Standard (ODPS). |
|
Established |
| 202 |
xxh/xxh-shell-xonsh
Use @xonsh wherever you go through the SSH without installation on the host. |
|
Established |
| 203 |
turbot/steampipe-plugin-cloudflare
Use SQL to instantly query accounts, zones and more from Cloudflare. Open... |
|
Established |
| 204 |
StructuredLabs/preswald
Preswald is a WASM packager for Python-based interactive data apps: bundle... |
|
Established |
| 205 |
aws-samples/uncovering-hidden-connections-in-unstructured-financial-data
Uncovering Hidden Connections in Unstructured Financial Data using Amazon... |
|
Established |
| 206 |
byzer-org/byzer-lang
Byzer (former MLSQL): A low-code open-source programming language for data... |
|
Established |
| 207 |
turbot/steampipe-plugin-googleworkspace
Use SQL to instantly query calendar events, drive files, gmail messages, and... |
|
Established |
| 208 |
edanalytics/earthmover
CLI tool for transforming collections of tabular source data into a variety... |
|
Established |
| 209 |
turbot/steampipe-plugin-net
Use SQL to instantly query DNS records, certificates and other network... |
|
Established |
| 210 |
MLT-OSS/FirstData
The World's Most Comprehensive, Authoritative, and Structured Open Source... |
|
Established |
| 211 |
wgzhao/addax-admin
Addax Admin is a web-based management console for Addax ETL jobs, offering... |
|
Established |
| 212 |
ogbinar/DataEngineeringPilipinas
Data Engineering Pilipinas is a community for data engineers, data analysts,... |
|
Established |
| 213 |
ashish10alex/vscode-dataform-tools
Dataform Tools - VS Code extension to run and visualise Dataform data... |
|
Established |
| 214 |
build-on-aws/rag-postgresql-agent-bedrock
This application is built in four stages using infrastructure as code with... |
|
Established |
| 215 |
robert-koch-institut/mex-common
RKI Metadata Exchange | Software development toolkit for the MEx project... |
|
Established |
| 216 |
mehd-io/pypi-duck-flow
end-to-end data engineering project to get insights from PyPi using python,... |
|
Established |
| 217 |
turbot/steampipe-plugin-stripe
Use SQL to instantly query customers, products, invoices and more from... |
|
Established |
| 218 |
hbz/lobid-resources
Transformation, web frontend, and API for the hbz catalog as LOD |
|
Established |
| 219 |
JuliaML/TableTransforms.jl
Transforms and pipelines with tabular data in Julia |
|
Established |
| 220 |
GoPlasmatic/dataflow-rs
A high-performance rules engine for IFTTT-style automation in Rust with... |
|
Established |
| 221 |
turbot/steampipe-plugin-zendesk
Use SQL to instantly query Zendesk. Open source CLI. No DB required. |
|
Established |
| 222 |
turbot/steampipe-plugin-oci
Use SQL to instantly query Oracle Cloud resources across regions and... |
|
Established |
| 223 |
turbot/steampipe-plugin-datadog
Use SQL to instantly query Datadog resources across accounts. Open source... |
|
Established |
| 224 |
turbot/steampipe-plugin-salesforce
Use SQL to instantly query Salesforce resources. Open source CLI. No DB required. |
|
Established |
| 225 |
dtmirizzi/target-elasticsearch
A Meltano target for Elasticsearch |
|
Established |
| 226 |
pplu/aws-sdk-perl
A community AWS SDK for Perl Programmers |
|
Established |
| 227 |
turbot/steampipe-plugin-prometheus
Use SQL to instantly query Prometheus metrics, alerts, labels and more. Open... |
|
Established |
| 228 |
GovHub-br/data-application-gov-hub
Pipeline de Dados do Gov-Hub |
|
Established |
| 229 |
bbossgroups/bboss-elastic-tran
bboss-datatran 由 bboss 开源的数据采集&流批一体化工具,提供数据采集、清洗转换处理以及流批一体化计算功能;... |
|
Emerging |
| 230 |
bradfitz/embiggen-disk
embiggden-disk live-resizes a filesystem after first live-resizing any... |
|
Emerging |
| 231 |
bmeares/Meerschaum
Create and manage data pipes with Meerschaum. |
|
Emerging |
| 232 |
weifuwan/seatunnel-web
SeaTunnel Web is a visual platform for building, managing, and monitoring... |
|
Emerging |
| 233 |
turbot/steampipe-plugin-terraform
Use SQL to instantly query resources, data sources and more from Terraform... |
|
Emerging |
| 234 |
turbot/steampipe-plugin-okta
Use SQL to instantly query users, groups, applications and more from Okta.... |
|
Emerging |
| 235 |
thadhutch/sports-quant
End-to-end NFL data pipeline that scrapes PFF grades and Pro Football... |
|
Emerging |
| 236 |
hi-primus/optimus
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF,... |
|
Emerging |
| 237 |
turbot/steampipe-export
Steampipe Export is a zero-ETL CLI to fetch data from cloud services and... |
|
Emerging |
| 238 |
MilkMp/CIA-World-Factbooks-Archive-1990-2025
Complete structured archive of every CIA World Factbook edition from... |
|
Emerging |
| 239 |
irajhedayati/data-engineering
A set of Data Engineering tools online for public use |
|
Emerging |
| 240 |
turbot/steampipe-plugin-microsoft365
Use SQL to instantly query calendars, contacts, drives, mailboxes and more... |
|
Emerging |
| 241 |
DataBora/elusion
DataFrame / Data Engineering Library with familiar syntax like ones we love:... |
|
Emerging |
| 242 |
SpareCores/sc-crawler
Pull and standardize data on cloud compute resources. |
|
Emerging |
| 243 |
Pipelex/pipelex-cookbook
Cookbook for Pipelex, the declarative language for composable Al workflows.... |
|
Emerging |
| 244 |
mc2-project/opaque-sql
An encrypted data analytics platform |
|
Emerging |
| 245 |
turbot/steampipe-plugin-csv
Use SQL to instantly query data from CSV files. Open source CLI. No DB required. |
|
Emerging |
| 246 |
capitalone/DataProfiler
What's in your data? Extract schema, statistics and entities from datasets |
|
Emerging |
| 247 |
koralium/flowtide
High-performance streaming SQL query engine designed for real-time data... |
|
Emerging |
| 248 |
aartikis/RTEC
RTEC is an Event Calculus implementation optimised for stream reasoning |
|
Emerging |
| 249 |
turbot/steampipe-plugin-rss
Use SQL to instantly query RSS channels and Atom Feeds. Open source CLI. No... |
|
Emerging |
| 250 |
scribe-org/Scribe-Server
Backend service for Scribe data downloads |
|
Emerging |
| 251 |
myriade-ai/myriade
AI Native Data Platform: explore, clean, transform and govern your data... |
|
Emerging |
| 252 |
turbot/steampipe-plugin-config
Use SQL to instantly query data from various types of config files. Open... |
|
Emerging |
| 253 |
databricks-industry-solutions/python-data-sources
Quality python data sources for pyspark 4.x |
|
Emerging |
| 254 |
turbot/steampipe-plugin-jenkins
Use SQL to instantly query Jenkins resources. Open source CLI. No DB required. |
|
Emerging |
| 255 |
turbot/steampipe-plugin-mastodon
Use SQL to instantly query Mastodon resources. Open source CLI. No DB required. |
|
Emerging |
| 256 |
turbot/steampipe-plugin-reddit
Use SQL to instantly query Reddit posts, comments & more. Open source CLI.... |
|
Emerging |
| 257 |
edrewitz/WxData
A Python library that acts as a client to download, pre-process and... |
|
Emerging |
| 258 |
catalyst-cooperative/ferc-xbrl-extractor
A tool for converting FERC filings published in XBRL into SQLite databases |
|
Emerging |
| 259 |
SETL-Framework/setl
A simple Spark-powered ETL framework that just works 🍺 |
|
Emerging |
| 260 |
NeaByteLab/IDX-API
Indonesian Stock Exchange API wrapper for trading data integration. |
|
Emerging |
| 261 |
turbot/steampipe-plugin-shodan
Use SQL to instantly query host, DNS and exploit information using Shodan.... |
|
Emerging |
| 262 |
tuanx18/data-engineer-portfolio
This is a repository to demonstrate my details, skills, projects and to keep... |
|
Emerging |
| 263 |
turbot/steampipe-sqlite
Steampipe SQLite is a zero-ETL engine for SQLite. Virtual tables translate... |
|
Emerging |
| 264 |
sql-machine-learning/sqlflow
Brings SQL and AI together. |
|
Emerging |
| 265 |
FalkorDB/falkordb-ts
FalkorDB Typescript Client |
|
Emerging |
| 266 |
leftkats/awesome-greek-tech-jobs
A comprehensive map of companies that hire for tech jobs in Greece. |
|
Emerging |
| 267 |
MTSWebServices/onetl
One ETL tool to rule them all |
|
Emerging |
| 268 |
feathr-ai/feathr
Feathr – A scalable, unified data and AI engineering platform for enterprise |
|
Emerging |
| 269 |
turbot/steampipe-plugin-openai
Use SQL to instantly query OpenAI for completions, models & more. Open... |
|
Emerging |
| 270 |
turbot/steampipe-plugin-hypothesis
Use SQL to instantly query Hypothesis resources. Open source CLI. No DB required. |
|
Emerging |
| 271 |
jordilin/gitar
Git all remotes. git cli tool that targets both Github and Gitlab |
|
Emerging |
| 272 |
apache/incubator-devlake-playground
Apache DevLake is an open-source dev data platform to ingest, analyze, and... |
|
Emerging |
| 273 |
turbot/steampipe-plugin-circleci
Use SQL to instantly query projects, pipelines, builds and more from... |
|
Emerging |
| 274 |
snowplow/dbt-snowplow-normalize
A dbt package to support modelling event data via split tables for use in... |
|
Emerging |
| 275 |
libredb/libredb-studio
A modern, blazing-fast SQL IDE for the cloud era. Query PostgreSQL, MySQL,... |
|
Emerging |
| 276 |
joonan-lab/cwas
Category-wide association study (CWAS) (Werling et al., 2018; An et al., 2018) |
|
Emerging |
| 277 |
dagster-io/dagster-open-platform
Dagster Labs' open-source data platform, built with Dagster. |
|
Emerging |
| 278 |
abdubakr77/deepcsv
Automatically processes data files in directories, converts array-like... |
|
Emerging |
| 279 |
J0SAL/Decentralized-Expense-Tracker
Tracking Expenses Securely |
|
Emerging |
| 280 |
DawnbrandBots/yaml-yugi
A machine-readable, human-editable database of the Yu-Gi-Oh! Trading Card... |
|
Emerging |
| 281 |
buildersoftio/cortex
Cortex | Data Framework—a cutting-edge SDK that simplifies real-time data... |
|
Emerging |
| 282 |
turbot/steampipe-plugin-bitbucket
Use SQL to instantly query Bitbucket. Open source CLI. No DB required. |
|
Emerging |
| 283 |
Indexical-Metrics-Measure-Advisory/watchmen
Watchmen Platform is a low code data platform for data pipeline, meta data... |
|
Emerging |
| 284 |
elastiflow/pipelines
A lightweight Go framework for building stateful, real-time data pipelines.... |
|
Emerging |
| 285 |
turbot/steampipe-plugin-wiz
Use SQL to instantly query Wiz resources. Open source CLI. No DB required. |
|
Emerging |
| 286 |
turbot/steampipe-plugin-pagerduty
Use SQL to instantly query resources from PagerDuty. Open source CLI. No DB required. |
|
Emerging |
| 287 |
wilson-mok/demo
In this repository, you will find varies demo and presentations I have... |
|
Emerging |
| 288 |
turbot/steampipe-plugin-crowdstrike
Use SQL to instantly query CrowdStrike resources. Open source CLI. No DB required. |
|
Emerging |
| 289 |
monarch-initiative/koza
Data transformation framework for LinkML data models |
|
Emerging |
| 290 |
AbsaOSS/pramen
Resilient data pipeline framework running on Apache Spark |
|
Emerging |
| 291 |
opensnowcat/opensnowcat-collector
OpenSnowcat Collector, an open source fork of Snowplow (Apache 2.0 License) |
|
Emerging |
| 292 |
MLD3/FIDDLE
FlexIble Data-Driven pipeLinE – a preprocessing pipeline that transforms... |
|
Emerging |
| 293 |
PeopleForBikes/brokenspoke
A collection of tools for the BNA. |
|
Emerging |
| 294 |
Data-Research-Analysis/data-research-analysis-platform
Stop Guessing. Start Dominating Your Market. The only data platform built... |
|
Emerging |
| 295 |
alexei-led/spotinfo
CLI for exploring AWS EC2 Spot inventory. Inspect AWS Spot instance types,... |
|
Emerging |
| 296 |
wherobots/airflow-providers-wherobots
Airflow extensions for communicating with Wherobots Cloud |
|
Emerging |
| 297 |
BlazingDB/blazingsql
BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built... |
|
Emerging |
| 298 |
mikevan666/opendataworks
opendataworks... |
|
Emerging |
| 299 |
mattlianje/etl4s
Powerful, whiteboard-style ETL |
|
Emerging |
| 300 |
turbot/steampipe-plugin-exec
Use SQL to instantly query & run shell commands on local & remote servers.... |
|
Emerging |