Spark Hadoop Ml Pipelines Data Engineering Tools
There are 20 spark hadoop ml pipelines tools tracked. 10 score above 50 (established tier). The highest-rated is knime/knime-core at 68/100 with 772 stars. 2 of the top 10 are actively maintained.
Get all 20 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=data-engineering&subcategory=spark-hadoop-ml-pipelines&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
knime/knime-core
KNIME Analytics Platform |
|
Established |
| 2 |
sparklyr/sparklyr
R interface for Apache Spark |
|
Established |
| 3 |
apache/wayang
Apache Wayang is the first cross-platform data processing system. |
|
Established |
| 4 |
quixio/quix-streams
Python Streaming DataFrames for Kafka |
|
Established |
| 5 |
jtablesaw/tablesaw
Java dataframe and visualization library |
|
Established |
| 6 |
RumbleDB/rumble
Quick start: pip install jsoniq ⛈️ RumbleDB 2.0.0 "Lemon Ironwood" 🌳 for... |
|
Established |
| 7 |
dotnet/spark
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers. |
|
Established |
| 8 |
h2oai/sparkling-water
Sparkling Water provides H2O functionality inside Spark cluster |
|
Established |
| 9 |
evinism/mistql
A query / expression language for performing computations on JSON-like... |
|
Established |
| 10 |
byzer-org/byzer-lang
Byzer (former MLSQL): A low-code open-source programming language for data... |
|
Established |
| 11 |
mc2-project/opaque-sql
An encrypted data analytics platform |
|
Emerging |
| 12 |
viadee/camunda-kafka-polling-client
Stream your process history to Kafka |
|
Emerging |
| 13 |
Smart-Shaped/chaM3Leon
By Smart Shaped s.r.l. (https://www.smartshaped.com/) |
|
Emerging |
| 14 |
rhinempi/sparkhit
sparkhit - analyzing large scale genomic data on the cloud |
|
Emerging |
| 15 |
perguard/pg-streaming-performance-data
Data collection, feature engineering and machine learning of performance traces |
|
Emerging |
| 16 |
AvaAvarai/Java-Parallel-Coordinates-Vis
Java Parallel Coordinates Visualization Tool, to visualize... |
|
Emerging |
| 17 |
dhchenx/Catla-HS
Catla for Hadoop and Spark (Catla-HS): An open-source system to support... |
|
Experimental |
| 18 |
maengsanha/bigdata
KMU CS Hot Topics in Big Data |
|
Experimental |
| 19 |
aymane-maghouti/Big-Data-Project
This project aims to predict smartphone prices using a combination of batch... |
|
Experimental |
| 20 |
maistrovyi/actio
actio |
|
Experimental |