ilyalasy/DOM-LM
Unofficial Pytorch implementation of Dom-LM paper.
This project helps researchers and practitioners in natural language processing to pre-train a language model specifically designed for web document understanding. It takes raw web page data, such as from the SWDE dataset, processes it into a usable format, and then trains a masked language model. The output is a specialized language model that can be fine-tuned for various information extraction tasks on web content.
No commits in the last 6 months.
Use this if you are an NLP researcher or data scientist focused on understanding and extracting information from web documents, and you need a specialized pre-trained model for this domain.
Not ideal if you are looking for a general-purpose language model or a tool that is already fine-tuned for specific tasks like question answering or attribute extraction.
Stars
33
Forks
12
Language
Python
License
MIT
Category
Last pushed
Mar 06, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/ilyalasy/DOM-LM"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
xv44586/toolkit4nlp
transformers implement (architecture, task example, serving and more)
luozhouyang/transformers-keras
Transformer-based models implemented in tensorflow 2.x(using keras).
ufal/neuralmonkey
An open-source tool for sequence learning in NLP built on TensorFlow.
graykode/xlnet-Pytorch
Simple XLNet implementation with Pytorch Wrapper
uzaymacar/attention-mechanisms
Implementations for a family of attention mechanisms, suitable for all kinds of natural language...