YRL-AIDA/RuTaBERT
RuTaBERT is a framework for solving column type and property annotation problems based on fine-tuning a pre-trained language model (e.g., BERT) using a large-scale corpus of Russian-language tables.
This project helps data professionals, data scientists, and analysts working with large volumes of Russian-language tables. It automatically identifies the type or category of data within each column (e.g., city, date, product ID). You input a CSV file containing your Russian table data, and it outputs labels for each column, telling you what kind of information it holds. This saves significant manual effort in data preparation and understanding.
No commits in the last 6 months.
Use this if you need to quickly and accurately understand the content and categorize columns within many Russian-language tables without manual inspection.
Not ideal if your tables are primarily in languages other than Russian, or if you require column type annotation for highly specialized, non-standard data types not typically found in general knowledge bases.
Stars
7
Forks
1
Language
Python
License
MIT
Category
Last pushed
Mar 27, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/YRL-AIDA/RuTaBERT"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Tongjilibo/bert4torch
An elegent pytorch implement of transformers
nyu-mll/jiant
jiant is an nlp toolkit
lonePatient/TorchBlocks
A PyTorch-based toolkit for natural language processing
monologg/JointBERT
Pytorch implementation of JointBERT: "BERT for Joint Intent Classification and Slot Filling"
grammarly/gector
Official implementation of the papers "GECToR – Grammatical Error Correction: Tag, Not Rewrite"...