All Data Engineering Tools

1,297 tools ranked by quality score · Page 4 of 13

Showing 301–400 of 1,297
# Tool Score Tier
301 wp-labs/warp-parse

Focusing on building industry-leading ETL engines.

46
Emerging
302 turbot/steampipe-plugin-shopify

Use SQL to instantly query Shopify products, orders and more. Open source...

46
Emerging
303 colliery-io/cloacina

Embedded workflow orchestration library for Rust and Python. Build...

46
Emerging
304 turbot/steampipe-plugin-ipstack

Use SQL to instantly query IP geolocation and more from ipstack. Open source...

46
Emerging
305 strake-data/strake

The Data Layer for AI. A high-performance federated SQL engine that gives AI...

46
Emerging
306 turbot/steampipe-plugin-snowflake

Use SQL to instantly query Snowflake resources. Open source CLI. No DB required.

45
Emerging
307 Ultrathink-Solutions/openclaw-logfire

Pydantic Logfire observability plugin for OpenClaw — OTEL GenAI semantic...

45
Emerging
308 DevDizzle/gammarips-engine

An end-to-end, serverless AI platform built on Google Cloud that...

45
Emerging
309 turbot/steampipe-plugin-nomad

Use SQL to instantly query Nomad ACLs, deployments, namespaces & more. Open...

45
Emerging
310 sipist/sipist-workspace

This repository provides containerized applications and microservices for...

45
Emerging
311 turbot/steampipe-plugin-twilio

Use SQL to instantly query Twilio resources across accounts. Open source...

45
Emerging
312 turbot/steampipe-plugin-auth0

Use SQL to instantly query Auth0 resources. Open source CLI. No DB required.

45
Emerging
313 turbot/steampipe-plugin-tailscale

Use SQL to instantly query Tailscale resources. Open source CLI. No DB required.

45
Emerging
314 turbot/steampipe-plugin-equinix

Use SQL to instantly query infrastructure resources (e.g. servers, networks)...

45
Emerging
315 weld-project/weld

High-performance runtime for data analytics applications

45
Emerging
316 hiero-hackers/analytics

Stay up to date with hiero organisation activity and contributor diversity

45
Emerging
317 turbot/steampipe-plugin-hcloud

Use SQL to instantly query servers, networks and more from Hetzner Cloud....

45
Emerging
318 turbot/steampipe-plugin-servicenow

Use SQL to instantly query ServiceNow CMDB CI services, servers, incidents,...

45
Emerging
319 logjuicer/logjuicer

LogJuicer extracts anomalies from log

45
Emerging
320 icoretech/airbroke

🔥 Lightweight, Airbrake/Sentry-compatible, PostgreSQL-based Open Source Error Catcher

45
Emerging
321 turbot/steampipe-plugin-googledirectory

Use SQL to instantly query users, groups, domains and more from Google...

45
Emerging
322 terrylica/exness-data-preprocess

Professional forex tick data preprocessing with unified DuckDB storage,...

45
Emerging
323 AndreaBozzo/Ceres

Harvesting & Semantic search for open data portals

45
Emerging
324 bdist/bdist-workspace

This repository provides containerized applications and microservices for...

45
Emerging
325 GregoryKogan/yt-framework

Build scalable data pipelines on YTsaurus with automatic stage management,...

45
Emerging
326 19-84/redd-archiver

A PostgreSQL-backed archive generator that creates browsable HTML archives...

45
Emerging
327 skale-me/skale

High performance distributed data processing engine

45
Emerging
328 turbot/steampipe-plugin-scaleway

Use SQL to instantly query instances, networks, databases, and more from...

45
Emerging
329 turbot/steampipe-plugin-finance

Use SQL to instantly query financial data including quotes (equities,...

45
Emerging
330 bastienboutonnet/sheetwork

A handy package to load Google Sheets to your database right from the CLI...

45
Emerging
331 samber/awesome-olap

🧊 A curated list of OLAP databases, data lake tools, columnar engines, and...

45
Emerging
332 AMPATH/etl-rest-server

This project hosts scripts to generate flat tables used for reporting purposes.

45
Emerging
333 turbot/steampipe-plugin-steampipe

Use SQL to instantly query plugin metadata from the Steampipe Hub. Open...

45
Emerging
334 CategoricalData/CQL

Categorical Query Language IDE

45
Emerging
335 pretzelai/pretzelai

The modern replacement for Jupyter Notebooks

45
Emerging
336 turbot/steampipe-plugin-vanta

Use SQL to instantly query Vanta resources. Open source CLI. No DB required.

45
Emerging
337 B1AAB/EBA

An ML-first temporal graph of Bitcoin's on-chain fund flows.

45
Emerging
338 MTSWebServices/syncmaster-ui

Frontend for Syncmaster, no-code ETL tool. WIP

44
Emerging
339 turbot/steampipe-plugin-workos

Use SQL to instantly query resources from WorkOS. Open source CLI. No DB required.

44
Emerging
340 ineelhere/forex-connect

Streamlit Connection to Explore Foreign Currency Exchange rates 💰 in real-time

44
Emerging
341 rocketlaunchr/dataframe-go

DataFrames for Go: For statistics, machine-learning, and data...

44
Emerging
342 netxs2000/devops

DevOps Data Application Platform...

44
Emerging
343 rannd1nt/phaethon

Dimensional Data Pipeline & Semantic Data Engineering Framework

44
Emerging
344 turbot/steampipe-plugin-code

Use SQL to instantly query secrets and more from source code. Open source...

44
Emerging
345 alexhraber/flowhawk

Real-time eBPF-powered network security monitor with AI-driven threat...

44
Emerging
346 orchest/orchest

Build data pipelines, the easy way 🛠️

44
Emerging
347 moj-analytical-services/etl_manager

A python package to create a database on the platform using our moj data...

44
Emerging
348 ludovicschmetz-stack/datavow

Open-source data contract enforcement — define, sync dbt, validate, block,...

44
Emerging
349 DataKitchen/dataops-observability

DataOps Observability is part of DataKitchen's Open Source Data...

44
Emerging
350 turbot/steampipe-plugin-whois

Use SQL to instantly query WHOIS. Open source CLI. No DB required.

44
Emerging
351 jitsucom/bulker

Service for bulk-loading data to databases with automatic schema management...

44
Emerging
352 realdatadriven/etlx

ETL / ELT Framework powered by DuckDB, designed to seamlessly integrate and...

44
Emerging
353 AltimateAI/altimate-code

Opensource agentic data engineering harness for dbt, SQL, and cloud...

44
Emerging
354 turbot/steampipe-plugin-mongodbatlas

Use SQL to instantly query MongoDB Atlas resources. Open source CLI. No DB required.

44
Emerging
355 turbot/steampipe-plugin-ansible

Use SQL to instantly query Ansible resources. Open source CLI. No DB required.

44
Emerging
356 yanghaiji/JsonCleanseETL

JSONCleanseETL是一款专业的数据清洗和转换工具,旨在为用户提供高效处理JSON格式数据的解决方案。...

44
Emerging
357 melvynator/ELK_twitter

This is a data pipeline for Twitter (ETL) using the elastic stack...

44
Emerging
358 turbot/steampipe-plugin-zoom

Use SQL to instantly query meetings, users & more from Zoom. Open source...

44
Emerging
359 turbot/steampipe-plugin-sentry

Use SQL to instantly query Sentry organizations, projects, teams and more....

44
Emerging
360 turbot/steampipe-plugin-dockerhub

Use SQL to instantly query Docker Hub repositories, tags, tokens and more....

44
Emerging
361 turbot/steampipe-plugin-linear

Use SQL to instantly query Linear organizations, projects, teams, users &...

44
Emerging
362 nationalarchives/ds-caselaw-ingester

Parse judgements from the Transformation Engine and load them into MarkLogic...

44
Emerging
363 turbot/steampipe-plugin-hackernews

Use SQL to instantly query stories, users and other items from Hacker News....

44
Emerging
364 ContextData/VectorETL

Build super simple end-to-end data & ETL pipelines for your vector databases...

43
Emerging
365 Trojan3877/AWS-SageMaker-Snowflake-ML-Pipeline

The **AWS SageMaker + Snowflake ML Pipeline** is a fully production-grade,...

43
Emerging
366 aborruso/arigadicomando

Documentazione in italiano su strumenti CLI per dati strutturati e AI: CSV,...

43
Emerging
367 yobix-ai/extractous

Fast and efficient unstructured data extraction. Written in Rust with...

43
Emerging
368 bywwcnll/StreamPanel

Stream Panel 是一个 Chrome DevTools 扩展,允许开发者实时监控和检查流式请求。它支持 服务器发送事件 (SSE) 和 基于...

43
Emerging
369 fal-ai/dbt-fal

do more with dbt. dbt-fal helps you run Python alongside dbt, so you can...

43
Emerging
370 turbot/steampipe-plugin-chaos

Chaos plugin for testing Steampipe with the craziest edge cases we can think...

43
Emerging
371 turbot/steampipe-plugin-ipinfo

Use SQL to instantly query ipinfo.io for IP address information. Open source...

43
Emerging
372 trustedshops-public/schema2pyarrow

Converts AsyncApi and JsonSchema to PyArrow schema

43
Emerging
373 cloverdx/cloverdx-server-docker

CloverDX Docker container for CloverDX Server deployment including examples.

43
Emerging
374 emeraldpay/dshackle-archive

ETL for Bitcoin and Ethereum data

43
Emerging
375 datacompose/datacompose

Data Cleaning for Pyspark

42
Emerging
376 DataZooDE/flapi

API Framework heavily relying on the power of DuckDB and DuckDB extensions....

42
Emerging
377 turbot/steampipe-plugin-tfe

Use SQL to instantly query workspaces, runs and more from Terraform...

42
Emerging
378 turbot/steampipe-plugin-crtsh

Use SQL to instantly query crt.sh for certificates, log entries and more....

42
Emerging
379 cnstlungu/postcard-company-datamart

learning-by-doing data model built with dbt-core

42
Emerging
380 turbot/steampipe-plugin-hibp

Use SQL to instantly query breaches, passwords, pastes and more from HIBP....

42
Emerging
381 turbot/steampipe-plugin-trivy

Use SQL to instantly query advisories, vulnerabilities, packages, findings...

42
Emerging
382 digitalghost-dev/poke-cli

A hybrid CLI/TUI tool written in Go for viewing Pokémon data from the...

42
Emerging
383 DawnbrandBots/yaml-yugipedia

An automatically-updated collection of wikitexts from Yugipedia. Part of YAML Yugi.

42
Emerging
384 turbot/steampipe-plugin-databricks

Use SQL to instantly query Databricks resources. Open source CLI. No DB required.

42
Emerging
385 Beyond-Finance/dataeng-de-technical-assessment

Public repo of Beyond Finance's technical assessment for Data Engineering candidates

42
Emerging
386 ChrisDevRepo/vscode_data_lineage

VS Code extension for visualizing SQL Server database object dependencies...

42
Emerging
387 wharfie/wharfie

Wharfie is an experimental table-oriented data application framework built...

42
Emerging
388 FrigadeHQ/trench

Trench — Open-Source Analytics Infrastructure. A single production-ready...

42
Emerging
389 probcomp/bayeslite

BayesDB on SQLite. A Bayesian database table for querying the probable...

42
Emerging
390 zappzerapp/laravel-ingest

A robust, configuration-driven ETL and data import framework for Laravel....

42
Emerging
391 MTSWebServices/syncmaster

No-code ETL tool, based on onETL + PySpark

41
Emerging
392 turbot/steampipe-plugin-linkedin

Use SQL to instantly query LinkedIn for profiles, companies, connections &...

41
Emerging
393 ankiano/etl

Extract transform load CLI tool for extracting small and middle data volume...

41
Emerging
394 ccao-data/data-architecture

Codebase for CCAO data infrastructure construction and management

41
Emerging
395 zero-one-group/geni

A Clojure dataframe library that runs on Spark

41
Emerging
396 chalk-ai/chalk-go

Go client for Chalk

41
Emerging
397 turbot/steampipe-plugin-abuseipdb

Use SQL to instantly query IP abuse scores and more from AbuseIPDB. Open...

41
Emerging
398 turbot/steampipe-plugin-grafana

Use SQL to instantly query dashboards, data sources, users and more from...

41
Emerging
399 RustedBytes/audios-to-dataset

Convert your audio files into DuckDB or Parquet files

41
Emerging
400 intel/hdk

A low-level execution library for analytic data processing.

41
Emerging
« Prev 1 2 3 4 5 6 11 12 13 Next »