hi-primus/optimus
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Optimus simplifies the often-complex task of preparing raw data for analysis or machine learning models. It takes messy datasets—from CSVs, JSONs, databases, or even Excel files—and provides a straightforward way to clean, transform, and explore them. This tool is ideal for data scientists, analysts, or anyone who regularly works with large and varied datasets and needs to ensure data quality before further use.
1,541 stars. No commits in the last 6 months.
Use this if you need to quickly and efficiently prepare large, messy datasets for analysis or machine learning, and want a consistent workflow whether you're working on a laptop or a powerful GPU cluster.
Not ideal if your primary need is to build complex machine learning models rather than focusing on the preceding data preparation steps.
Stars
1,541
Forks
232
Language
Python
License
Apache-2.0
Category
Last pushed
Dec 02, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/data-engineering/hi-primus/optimus"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
fal-ai/dbt-fal
do more with dbt. dbt-fal helps you run Python alongside dbt, so you can send Slack alerts,...
galafis/distributed-data-processing-pipeline
Enterprise-grade distributed data processing pipeline with Apache Spark (Scala + Python), Delta...
hiazevedo/databricks-portfolio
Portfólio de projetos práticos de Data Engineering e ML com Databricks — Delta Lake, MLflow,...
joekakone/db-analytics-tools
Databases Analytics Tools - Data Integration - Data Visualization - Machine Learning