RumbleDB/rumble

Quick start: pip install jsoniq ⛈️ RumbleDB 2.0.0 "Lemon Ironwood" 🌳 for Apache Spark | Run queries on your large-scale, messy datasets (JSON, text, CSV, Parquet, Delta...) | Data Lakehouse with Updates, Scripting, Declarative Machine Learning and more

59
/ 100
Established

This tool helps data professionals clean, prepare, and validate large, varied datasets from sources like JSON, CSV, or Parquet files, often stored in data lakes. It takes your raw, messy data and transforms it into clean, structured inputs ready for analysis or machine learning pipelines. Data engineers, scientists, and analysts who work with diverse data formats at scale will find this useful.

239 stars.

Use this if you need to query and transform large volumes of complex, nested, or heterogeneous data that doesn't easily fit into traditional tables or dataframes.

Not ideal if you primarily work with small, clean, tabular datasets where a standard spreadsheet or SQL database is sufficient.

data-lake-management data-preparation data-wrangling ETL big-data-analytics
No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 23 / 25

How are scores calculated?

Stars

239

Forks

84

Language

Java

License

Last pushed

Mar 12, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/data-engineering/RumbleDB/rumble"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.