ilyalasy/DOM-LM

Unofficial Pytorch implementation of Dom-LM paper.

41
/ 100
Emerging

This project helps researchers and practitioners in natural language processing to pre-train a language model specifically designed for web document understanding. It takes raw web page data, such as from the SWDE dataset, processes it into a usable format, and then trains a masked language model. The output is a specialized language model that can be fine-tuned for various information extraction tasks on web content.

No commits in the last 6 months.

Use this if you are an NLP researcher or data scientist focused on understanding and extracting information from web documents, and you need a specialized pre-trained model for this domain.

Not ideal if you are looking for a general-purpose language model or a tool that is already fine-tuned for specific tasks like question answering or attribute extraction.

Web Data Extraction Natural Language Processing Information Retrieval Text Mining Machine Learning Research
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 18 / 25

How are scores calculated?

Stars

33

Forks

12

Language

Python

License

MIT

Last pushed

Mar 06, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/ilyalasy/DOM-LM"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.