h2oai/sparkling-water

Sparkling Water provides H2O functionality inside Spark cluster

57
/ 100
Established

This tool helps data scientists and machine learning engineers who work with large datasets perform advanced analytics and build machine learning models efficiently. It allows you to combine the data processing power of Apache Spark with the high-performance machine learning algorithms from H2O-3. You provide your structured data within a Spark environment, and it enables you to train and score models using H2O's capabilities, ultimately yielding powerful predictive insights.

977 stars.

Use this if you need to build and deploy machine learning models on very large datasets already managed within an Apache Spark ecosystem.

Not ideal if your datasets are small, or if you are not already using Apache Spark for your data processing.

data science machine learning engineering big data analytics predictive modeling distributed computing
No Package No Dependents
Maintenance 6 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 25 / 25

How are scores calculated?

Stars

977

Forks

362

Language

Scala

License

Apache-2.0

Last pushed

Nov 05, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/data-engineering/h2oai/sparkling-water"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.