stuartemiddleton/glosat_table_dataset
GloSAT Historical Measurement Table Dataset
This dataset provides scanned historical measurement tables from ship logs and land stations, specifically tailored for researchers and data rescue specialists working with historical climate data. It includes annotations for tables, cells, headings, headers, and captions, enabling the training and testing of models to automatically extract structured data from these old documents. The output is a structured understanding of tables within scanned images, which is valuable for anyone involved in digitizing and analyzing historical records for scientific research.
Use this if you are a climate scientist, historian, or data archivist needing to automate the extraction of tabular data from scanned historical documents like ship logs or old weather station records.
Not ideal if you are looking for a ready-to-use tool to extract data from modern, digitally-born tables or if your primary interest is in general document layout analysis rather than specific table structures.
Stars
11
Forks
—
Language
Python
License
—
Category
Last pushed
Dec 03, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/stuartemiddleton/glosat_table_dataset"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Psarpei/Multi-Type-TD-TSR
Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and...
Layout-Parser/layout-parser
A Unified Toolkit for Deep Learning Based Document Image Analysis
Sudhanshu1304/table-transformer
🔍 Table Extraction Tool: A powerful open-source solution combining OCR and computer vision for...
asagar60/TableNet-pytorch
Pytorch Implementation of TableNet
ses4255/Versatile-OCR-Program
Multi-modal OCR pipeline optimized for ML training (text, figure, math, tables, diagrams)