RumbleDB/rumble

Quick start: pip install jsoniq ⛈️ RumbleDB 2.0.0 "Lemon Ironwood" 🌳 for Apache Spark | Run queries on your large-scale, messy datasets (JSON, text, CSV, Parquet, Delta...) | Data Lakehouse with Updates, Scripting, Declarative Machine Learning and more

/ 100

Established

This tool helps data professionals clean, prepare, and validate large, varied datasets from sources like JSON, CSV, or Parquet files, often stored in data lakes. It takes your raw, messy data and transforms it into clean, structured inputs ready for analysis or machine learning pipelines. Data engineers, scientists, and analysts who work with diverse data formats at scale will find this useful.

239 stars.

Use this if you need to query and transform large volumes of complex, nested, or heterogeneous data that doesn't easily fit into traditional tables or dataframes.

Not ideal if you primarily work with small, clean, tabular datasets where a standard spreadsheet or SQL database is sufficient.

data-lake-management data-preparation data-wrangling ETL big-data-analytics

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 23 / 25

How are scores calculated?

Stars

239

Forks

Language

Java

License

—

Related tools

knime/knime-core

KNIME Analytics Platform

sparklyr/sparklyr

R interface for Apache Spark

apache/wayang

Apache Wayang is the first cross-platform data processing system.

quixio/quix-streams

Python Streaming DataFrames for Kafka

jtablesaw/tablesaw

Java dataframe and visualization library

Explore Data Engineering Tools

All categories Trending Data Engineering directory Insights