RumbleDB/rumble
Quick start: pip install jsoniq ⛈️ RumbleDB 2.0.0 "Lemon Ironwood" 🌳 for Apache Spark | Run queries on your large-scale, messy datasets (JSON, text, CSV, Parquet, Delta...) | Data Lakehouse with Updates, Scripting, Declarative Machine Learning and more
This tool helps data professionals clean, prepare, and validate large, varied datasets from sources like JSON, CSV, or Parquet files, often stored in data lakes. It takes your raw, messy data and transforms it into clean, structured inputs ready for analysis or machine learning pipelines. Data engineers, scientists, and analysts who work with diverse data formats at scale will find this useful.
239 stars.
Use this if you need to query and transform large volumes of complex, nested, or heterogeneous data that doesn't easily fit into traditional tables or dataframes.
Not ideal if you primarily work with small, clean, tabular datasets where a standard spreadsheet or SQL database is sufficient.
Stars
239
Forks
84
Language
Java
License
—
Category
Last pushed
Mar 12, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/data-engineering/RumbleDB/rumble"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
knime/knime-core
KNIME Analytics Platform
sparklyr/sparklyr
R interface for Apache Spark
apache/wayang
Apache Wayang is the first cross-platform data processing system.
quixio/quix-streams
Python Streaming DataFrames for Kafka
jtablesaw/tablesaw
Java dataframe and visualization library