eto-ai/rikai
Parquet-based ML data format optimized for working with unstructured data
Rikai helps AI practitioners manage large collections of unstructured data like images or videos for machine learning projects. It takes raw media files and annotations, organizes them into a structured format, and outputs readily usable datasets for model training or analysis using tools like PyTorch or Spark. This is for machine learning engineers, data scientists, and AI researchers who work with computer vision or other unstructured data types.
141 stars. No commits in the last 6 months. Available on PyPI.
Use this if you need a streamlined way to handle, query, and prepare vast amounts of unstructured data for your AI models, especially when working with Spark.
Not ideal if your primary data consists only of structured tables, or if you prefer not to use Apache Spark for your data processing.
Stars
141
Forks
22
Language
Jupyter Notebook
License
Apache-2.0
Category
Last pushed
Jan 05, 2023
Commits (30d)
0
Dependencies
12
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/eto-ai/rikai"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
treeverse/dvc
🦉 Data Versioning and ML Experiments
runpod/runpod-python
🐍 | Python library for RunPod API and serverless worker SDK.
microsoft/vscode-jupyter
VS Code Jupyter extension
4paradigm/OpenMLDB
OpenMLDB is an open-source machine learning database that provides a feature platform computing...
uber/petastorm
Petastorm library enables single machine or distributed training and evaluation of deep learning...