ilyalasy/DOM-LM

Unofficial Pytorch implementation of Dom-LM paper.

/ 100

Emerging

This project helps researchers and practitioners in natural language processing to pre-train a language model specifically designed for web document understanding. It takes raw web page data, such as from the SWDE dataset, processes it into a usable format, and then trains a masked language model. The output is a specialized language model that can be fine-tuned for various information extraction tasks on web content.

No commits in the last 6 months.

Use this if you are an NLP researcher or data scientist focused on understanding and extracting information from web documents, and you need a specialized pre-trained model for this domain.

Not ideal if you are looking for a general-purpose language model or a tool that is already fine-tuned for specific tasks like question answering or attribute extraction.

Web Data Extraction Natural Language Processing Information Retrieval Text Mining Machine Learning Research

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 18 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

xv44586/toolkit4nlp

transformers implement (architecture, task example, serving and more)

luozhouyang/transformers-keras

Transformer-based models implemented in tensorflow 2.x(using keras).

ufal/neuralmonkey

An open-source tool for sequence learning in NLP built on TensorFlow.

graykode/xlnet-Pytorch

Simple XLNet implementation with Pytorch Wrapper

uzaymacar/attention-mechanisms

Implementations for a family of attention mechanisms, suitable for all kinds of natural language...

Explore NLP Tools

All categories Trending NLP directory Insights