himkt/awesome-bert-japanese

📝 A list of pre-trained BERT models for Japanese with word/subword tokenization + vocabulary construction algorithm information

/ 100

Experimental

When working with Japanese text for natural language processing, you need to carefully choose how to break down sentences into words and subwords, as Japanese doesn't use spaces between words. This project provides a clear table comparing various pre-trained BERT models for Japanese, detailing the specific word segmentation, subword tokenization, and vocabulary construction algorithms used by each. Data scientists and NLP researchers who build or fine-tune models for Japanese text will find this resource useful.

131 stars. No commits in the last 6 months.

Use this if you need to select the most appropriate pre-trained BERT model for your Japanese NLP task and want to understand the linguistic processing choices made in its creation.

Not ideal if you are looking for ready-to-use APIs or code implementations of these models, as this project is a comparison guide, not a model library.

Japanese NLP text processing BERT models linguistic analysis machine learning

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 9 / 25

How are scores calculated?

Stars

131

Forks

Language

—

License

—

Higher-rated alternatives

codertimo/BERT-pytorch

Google AI 2018 BERT pytorch implementation

JayYip/m3tl

BERT for Multitask Learning

920232796/bert_seq2seq

pytorch实现 Bert 做seq2seq任务，使用unilm方案,现在也可以做自动摘要，文本分类，情感分析，NER，词性标注等任务,支持t5模型，支持GPT2进行文章续写。

sileod/tasknet

Easy modernBERT fine-tuning and multi-task learning

graykode/toeicbert

TOEIC(Test of English for International Communication) solving using pytorch-pretrained-BERT model.

Explore NLP Tools

All categories Trending NLP directory Insights