1000-7/xinlp

把李航老师《统计学习方法》的后几章的算法都用java实现了一遍,实现盒子与球的EM算法,扩展到去GMM训练,后来实现了HMM分词(实现了HMM分词的参数训练)和CRF分词(借用CRF++训练的参数模型),最后利用tensorFlow把BiLSTM+CRF实现了,然后为lucene包装了一个XinAnalyzer

31
/ 100
Emerging

This project helps developers integrate advanced Chinese natural language processing directly into their applications. It takes raw Chinese text as input and breaks it down into individual words or meaningful segments, which are then used to power search engines or other text analysis tools. It's primarily used by software developers building applications that require sophisticated Chinese text understanding.

No commits in the last 6 months.

Use this if you are a developer looking to implement robust Chinese word segmentation, especially for search functionalities within applications built on platforms like Lucene.

Not ideal if you are a non-developer seeking an off-the-shelf tool for general Chinese text analysis without coding.

Chinese NLP Text Segmentation Search Engine Development Information Retrieval Software Development
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 8 / 25
Community 17 / 25

How are scores calculated?

Stars

23

Forks

11

Language

Java

License

Last pushed

Jun 17, 2022

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/1000-7/xinlp"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.