vaexio/vaex
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
When working with massive datasets, Vaex allows you to quickly load, analyze, and visualize tabular data that's too big to fit in your computer's memory. It takes large data files (like HDF5 or Apache Arrow) as input, processes them efficiently without copying everything into RAM, and outputs statistics, visualizations (histograms, density plots), or transformed datasets. This tool is ideal for data scientists, analysts, or researchers grappling with extremely large tables.
8,492 stars. Used by 1 other package. Available on PyPI.
Use this if you need to perform interactive data exploration, calculate statistics, or create visualizations on tabular datasets containing billions of rows, directly on your laptop without needing a cluster.
Not ideal if your datasets are small enough to be comfortably handled by tools like Pandas or Excel, as the overhead of a specialized 'out-of-core' solution won't be beneficial.
Stars
8,492
Forks
603
Language
Python
License
MIT
Category
Last pushed
Mar 01, 2026
Commits (30d)
0
Dependencies
7
Reverse dependents
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/data-engineering/vaexio/vaex"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
mage-ai/mage-ai
🧙 Build, run, and manage data pipelines for integrating and transforming data.
alibaba/feathub
FeatHub - A stream-batch unified feature store for real-time machine learning
mindsdb/dbt-mindsdb
dbt adapter for connecting to MindsDB
kevin-hanselman/dud
A lightweight CLI tool for versioning data alongside source code and building data pipelines.
Bread-Technologies/Bread-Dataset-Viewer
VS Code extension to easily view and handle large datasets. Look at JSONL/Parquet/CSV files...