opengeos/aws-open-data
A list of open datasets on AWS
This project helps researchers and data scientists quickly find and access a wide variety of public datasets hosted on Amazon Web Services (AWS). It takes the sprawling list of AWS Open Data and compiles it into easily consumable TSV and JSON files. Data practitioners, especially those working with large-scale data analysis or machine learning projects, would find this useful for discovering new data sources.
Use this if you need an up-to-date, programmatically accessible list of all publicly available datasets on AWS for your research or data project.
Not ideal if you are looking for a visual browser to explore datasets or require detailed metadata for each dataset beyond what a simple list provides.
Stars
59
Forks
9
Language
Python
License
MIT
Category
Last pushed
Mar 13, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/opengeos/aws-open-data"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
open-edge-platform/datumaro
Dataset Management Framework, a Python library and a CLI tool to build, analyze and manage...
explosion/ml-datasets
🌊 Machine learning dataset loaders for testing and example scripts
webdataset/webdataset
A high-performance Python-based I/O system for large (and small) deep learning problems, with...
tensorflow/datasets
TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
mlcommons/croissant
Croissant is a high-level format for machine learning datasets that brings together four rich layers.