h2oai/sparkling-water

Sparkling Water provides H2O functionality inside Spark cluster

/ 100

Established

This tool helps data scientists and machine learning engineers who work with large datasets perform advanced analytics and build machine learning models efficiently. It allows you to combine the data processing power of Apache Spark with the high-performance machine learning algorithms from H2O-3. You provide your structured data within a Spark environment, and it enables you to train and score models using H2O's capabilities, ultimately yielding powerful predictive insights.

977 stars.

Use this if you need to build and deploy machine learning models on very large datasets already managed within an Apache Spark ecosystem.

Not ideal if your datasets are small, or if you are not already using Apache Spark for your data processing.

data science machine learning engineering big data analytics predictive modeling distributed computing

No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 25 / 25

How are scores calculated?

Stars

977

Forks

362

Language

Scala

License

Apache-2.0

Related tools

knime/knime-core

KNIME Analytics Platform

sparklyr/sparklyr

R interface for Apache Spark

apache/wayang

Apache Wayang is the first cross-platform data processing system.

quixio/quix-streams

Python Streaming DataFrames for Kafka

jtablesaw/tablesaw

Java dataframe and visualization library

Explore Data Engineering Tools

All categories Trending Data Engineering directory Insights