vaexio/vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀

/ 100

Established

When working with massive datasets, Vaex allows you to quickly load, analyze, and visualize tabular data that's too big to fit in your computer's memory. It takes large data files (like HDF5 or Apache Arrow) as input, processes them efficiently without copying everything into RAM, and outputs statistics, visualizations (histograms, density plots), or transformed datasets. This tool is ideal for data scientists, analysts, or researchers grappling with extremely large tables.

8,492 stars. Used by 1 other package. Available on PyPI.

Use this if you need to perform interactive data exploration, calculate statistics, or create visualizations on tabular datasets containing billions of rows, directly on your laptop without needing a cluster.

Not ideal if your datasets are small enough to be comfortably handled by tools like Pandas or Excel, as the overhead of a specialized 'out-of-core' solution won't be beneficial.

big-data-analysis data-visualization large-scale-data-exploration statistical-analysis machine-learning-preparation

Maintenance 10 / 25

Adoption 11 / 25

Maturity 25 / 25

Community 19 / 25

How are scores calculated?

Stars

8,492

Forks

603

Language

Python

License

MIT

Related tools

mage-ai/mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

alibaba/feathub

FeatHub - A stream-batch unified feature store for real-time machine learning

mindsdb/dbt-mindsdb

dbt adapter for connecting to MindsDB

kevin-hanselman/dud

A lightweight CLI tool for versioning data alongside source code and building data pipelines.

Bread-Technologies/Bread-Dataset-Viewer

VS Code extension to easily view and handle large datasets. Look at JSONL/Parquet/CSV files...

Explore Data Engineering Tools

All categories Trending Data Engineering directory Insights