All Data Engineering Tools

1,297 tools ranked by quality score · Page 5 of 13

Showing 401–500 of 1,297
# Tool Score Tier
401 turbot/steampipe-plugin-ldap

Use SQL to instantly query users, groups, OUs and more from LDAP. Open...

41
Emerging
402 AbstractionsLab/satrap-dl

SATRAP-DL (Semi-Automated Threat Reconnaissance and Analysis Powered by...

41
Emerging
403 Edwardvaneechoud/pyfloe

A minimal zero dependency dataframe library

41
Emerging
404 ivszhuravlev/spark-tuning-handbook

Hands-on Spark internals and performance engineering.

41
Emerging
405 Oatapza/libredb-studio

🛠️ Build and manage SQL databases effortlessly with LibreDB Studio, the...

41
Emerging
406 jrlasak/awesome-databricks

170+ curated resources every Databricks Data Engineer should bookmark -...

41
Emerging
407 OpenAF/oafp

Command-line tool that takes an input, usually a data structure (e.g. json),...

41
Emerging
408 synmetrix/synmetrix

Synmetrix – production-ready open source semantic layer on Cube

41
Emerging
409 markusbegerow/data-analytics-exercises

End-to-end data warehouse exercises for students - build a modern ELT...

41
Emerging
410 neokd/DataStorehouse

DataStoreHouse is an open-source project that aims to create a collaborative...

41
Emerging
411 turbot/steampipe-plugin-twitter

Use SQL to instantly query tweets, users and followers from Twitter. Open...

40
Emerging
412 hasna/connectors

Open source API connectors

40
Emerging
413 VincenzoImp/job-search-tool

Automated job search and analysis tool powered by JobSpy. Features...

40
Emerging
414 crackcell/hpipe

Workflow engine for various computing systems.

40
Emerging
415 mahmoudparsian/data-warehousing

This repository is a place for the Data Warehousing course at the...

40
Emerging
416 turbot/steampipe-plugin-googlesheets

Use SQL to instantly query spreadsheets, sheets, and cell data from Google...

40
Emerging
417 mindsdb/dbt-mindsdb

dbt adapter for connecting to MindsDB

40
Emerging
418 Canner/vulcan-sql

Data API Framework for AI Agents and Data Apps

40
Emerging
419 fairtracks/omnipy

Omnipy is a high level Python library for type-driven data wrangling and...

40
Emerging
420 wtbates99/tabletalk

tabeltalk is a declarative language for seamless interaction with your...

40
Emerging
421 govtech-data-practice/vowl

A validation engine for Open Data Contract Standard (ODCS) data contracts....

40
Emerging
422 DataKitchen/dataops-observability-agents

DataOps Observability Integration Agents are part of DataKitchen's Open...

40
Emerging
423 nshkrdotcom/flowstone

Asset-first data orchestration for Elixir/BEAM. Dagster-inspired with OTP...

40
Emerging
424 Vetdatahub/VetDataHub

VetDataHub is an opensource veterinary datasets repository dedicated to...

40
Emerging
425 sevapru/terrorblade

A unified data extraction and parsing platform for messaging platforms. It...

40
Emerging
426 Amber-Williams/hackernews-whos-hiring

Real-time SQL database from Hacker News "hiring" thread

40
Emerging
427 tenzir/library

Packages for the Tenzir ecosystem.

39
Emerging
428 nightmarewalker/D-MemFS

In-process virtual filesystem with hard quota for Python

39
Emerging
429 mlr-org/mlr3db

Data Backends to let mlr3 work transparently with (remote) data bases

39
Emerging
430 AbdullahEmad22/realtime-data-engineering-project

An end-to-end data engineering pipeline that orchestrates data ingestion,...

39
Emerging
431 AlvaroCavalcante/airflow-parse-bench

Stop creating bad DAGs! Use this tool to measure and compare the parse time...

39
Emerging
432 continuous-dems/fetchez

Fetchez is a lightweight, modular, and highly extendable Python framework...

39
Emerging
433 prefeitura-rio/pipelines_rj_smtr

Códigos de captura e tratamento de dados da SMTR

39
Emerging
434 exasol/exasol-personal

The High-Performance Analytics Engine — Free for Personal Use

39
Emerging
435 turbot/steampipe-plugin-virustotal

Use SQL to instantly query file, domain, URL and IP scanning results from VirusTotal.

39
Emerging
436 stitchfix/hamilton

A scalable general purpose micro-framework for defining dataflows. THIS...

39
Emerging
437 onlozanoo/databroom

Databroom is a cross-language data cleaning tool with CLI, GUI, and API....

38
Emerging
438 kasztp/dbx-exam-guide

Databricks Certifications - Exam prep guide

38
Emerging
439 root-11/tablite

multiprocessing enabled out-of-memory data analysis library for tabular data.

38
Emerging
440 gopidesupavan/qualink

Data quality validation, profiling, anomaly detection, and YAML-driven...

38
Emerging
441 ottogroup/koality

Library for data quality monitoring based on duckdb.

38
Emerging
442 BEKO2210/World_report

A self-updating global dashboard that aggregates 40+ open data sources...

38
Emerging
443 SunnyX6/Datapillar

Raw In, Golden Wings Out

38
Emerging
444 rush-db/rushdb

RushDB is an Instant Database for Modern Apps & AI. Built on top of Neo4j.

38
Emerging
445 drake69/spendify

🏦 Personal finance ledger — aggregates bank statements (CSV/XLSX) into a...

38
Emerging
446 BigData-Ananlysiser/UGC-Analysiser

一个开源的全栈大数据项目,主要包含实时数据采集/机器学习/大数据处理/前端可视化

38
Emerging
447 aasouzaconsult/portfolio-dados

Repositório de Projetos em Análises de Dados (buscando valor em dados!!!)

38
Emerging
448 Thyznol/firefly-iii-Pico-Data-Importer

The Firefly III Data Importer can import data into Firefly, Automatically...

38
Emerging
449 chnm/bom

Website files, database GUI, and data pipeline scripts for the London Bills...

38
Emerging
450 bogwi/sarpro

Blazing-fast Sentinel‑1 Synthetic Aperture Radar (SAR) GRD to GeoTIFF/JPEG...

38
Emerging
451 kameshsampath/postgis-snowflake-intelligence-demo

This demo showcases a production-ready architecture for managing smart city...

38
Emerging
452 vedanthv/data-engineering-portfolio

Cool DE Projects

38
Emerging
453 mbari-org/aidata

(ETL) Extract, transform, load/download and augment images and annotations...

38
Emerging
454 polarbase-team/polarbase

Extensible Open-source Data Backend for PostgreSQL. Features a multi-view UI...

37
Emerging
455 jtakish/airflow-provider-sap-hana

Airflow provider package for SAP HANA

37
Emerging
456 atolcd/sdis-remocra

🔥 Remocra - Plateforme métier opensource conçue par et pour les SDIS.

37
Emerging
457 polakowo/datadocs

Documentation for data enthusiasts

37
Emerging
458 kevin-hanselman/dud

A lightweight CLI tool for versioning data alongside source code and...

37
Emerging
459 savantly-net/nexus-command

FOSS ERP - data management, automation, and integration for any business....

37
Emerging
460 empowerai/fs-middlelayer-api

US Forest Service ePermit API

37
Emerging
461 SwellDB/SwellDB

The data system that answers anything.

37
Emerging
462 bitroot/coflux

Open-source workflow engine. Orchestrate and observe computational workflows...

37
Emerging
463 sergio11/covid_tweets_etl_architecture

📚🧪 This is a learning-focused POC that explores a microservices ETL...

37
Emerging
464 SentryPeer/SentryPeerHQ

Fraud Detection for VoIP. Use SentryPeer® HQ to help prevent VoIP...

36
Emerging
465 viadee/camunda-kafka-polling-client

Stream your process history to Kafka

36
Emerging
466 tshu-w/DBCopilot

Code and data for the paper "DBCᴏᴘɪʟᴏᴛ: Natural Language Querying over...

36
Emerging
467 astronomer/cosmos-ebook-companion

Companion repository to the Practical Guide: Orchestrating dbt with Apache...

36
Emerging
468 cderickson/Mox-Data.com

Mox-Data.com is a cloud-based data ingestion tool used to process raw data...

36
Emerging
469 pkochanowicz/n8n-setup-docker

Fast, safe and smart setup for self-hosted n8n placed in a Docker container,...

36
Emerging
470 Bread-Technologies/Bread-Dataset-Viewer

VS Code extension to easily view and handle large datasets. Look at...

36
Emerging
471 ErcinDedeoglu/Postalized

The ultimate address parsing tool. Effortlessly parse and expand postal data...

36
Emerging
472 Zipstack/visitran

Modern, AI-native and agentic Pythonic data transformation platform.

36
Emerging
473 bruin-data/setup-bruin

Official action to install Bruin CLI in Github Actions.

36
Emerging
474 GSA/coe-hud-acquisitions

A repository that contains links and information for acquisitions and...

36
Emerging
475 provero-org/provero

Declarative data quality engine. Define checks in YAML, run anywhere.

36
Emerging
476 jamie-steele/dockpipe

Run, isolate, and act — pipe commands into disposable containers and process...

35
Emerging
477 richban/opendata-stack-platform

Open Data Stack Platform: a collection of projects and pipelines built with...

35
Emerging
478 peter115342/soccer-tracker-DE-project

End-To-End Data Engineering Project. Made to learn some common data...

35
Emerging
479 equitusai/arcxa

Mapping intelligence for enterprise data migrations: schema mapping,...

35
Emerging
480 paulnamalomba/datashadric

datashadric provides a collection of well-organized modules for common data...

35
Emerging
481 ramiradwan/onlyfans-conversational-analytics

provides a unified view of conversation and analysis data to help you...

35
Emerging
482 Bigdata-com/bigdata-briefs

Generate briefs based on financially relevant information from Bigdata.com

35
Emerging
483 apache/seatunnel-tools

SeaTunnel is a multimodal, high-performance, distributed, massive data...

35
Emerging
484 dubbl-org/dubbl

A full-featured, open-source alternative to Xero and QuickBooks. It is...

35
Emerging
485 sicara/sicarator

Instant Setup & Best Quality for Data Projects!

35
Emerging
486 Smart-Shaped/chaM3Leon

By Smart Shaped s.r.l. (https://www.smartshaped.com/)

35
Emerging
487 TJAdryan/astro_blog

This site uses the amazing Astro.build project. I added **Google Docs** ...

35
Emerging
488 The-Pulse-Engine/Pulse-Engine_Market_Intelligence_Platform

An explainable market analysis system that combines technical indicators and...

34
Emerging
489 jroakes/SEODP

The SEO Data Platform automates SEO analysis, aggregating data from Google...

34
Emerging
490 altamsh04/deafso-backend

A scalable backend for DeafSo (Capstone)

34
Emerging
491 MTSWebServices/horizon

Simple HWM Store backend

34
Emerging
492 PHACDataHub/data-mesh-ref-impl

Data Mesh Reference Implementation with standalone example use cases

34
Emerging
493 turbot/steampipe-plugin-digitalocean

Use SQL to instantly query droplets, VPCs, users and more from DigitalOcean....

34
Emerging
494 turbot/steampipe-plugin-openapi

Use SQL to instantly query resources from OpenAPI. Open source CLI. No DB required.

34
Emerging
495 PkLavc/PkLavc.github.io

PkLavc Portfolio | Solutions & Integration Architect (Technical Owner)....

34
Emerging
496 turbot/steampipe-plugin-imap

Use SQL to instantly query mailboxes, messages and more using IMAP. Open...

34
Emerging
497 benzsevern/goldencheck

Data validation that discovers rules from your data. 19 MCP tools on...

34
Emerging
498 limhaneul12/kafka-gov

Open-Source Apache Kafka Governance Platform

34
Emerging
499 Codex-Crusader/le_Market_Intelligence_Platform

An explainable market analysis system that combines technical indicators and...

34
Emerging
500 justvinhhere/bigquery-expert

Claude Code plugin that makes Claude a BigQuery expert. 5 skills covering...

34
Emerging
« Prev 1 2 3 4 5 6 7 11 12 13 Next »