GokuMohandas/data-engineering
Construct a modern data stack and orchestration the workflows to create high quality data for analytics and ML applications.
This project provides practical guidance and code to help you build a robust system for managing and delivering data for analytics and machine learning. You'll learn how to pull raw data from various sources, clean and organize it, and then store it in a way that's ready for data scientists or business analysts to use. This is designed for data engineers and ML Ops engineers who need to establish reliable data pipelines.
236 stars. No commits in the last 6 months.
Use this if you need to build, automate, and orchestrate the processes that extract, transform, and load data into a structured format for machine learning models or business intelligence dashboards.
Not ideal if you are looking for a conceptual overview without hands-on implementation details or if your primary role is data analysis rather than data pipeline construction.
Stars
236
Forks
39
Language
Jupyter Notebook
License
—
Category
Last pushed
Sep 12, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/mlops/GokuMohandas/data-engineering"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
apache/airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
mlrun/mlrun
MLRun is an open source MLOps platform for quickly building and managing continuous ML...
clearml/clearml
ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data...
argoproj-labs/hera
Hera makes Python code easy to orchestrate on Argo Workflows through native Python integrations....
argoproj/argo-workflows
Workflow Engine for Kubernetes