1000-7/xinlp

把李航老师《统计学习方法》的后几章的算法都用java实现了一遍，实现盒子与球的EM算法，扩展到去GMM训练，后来实现了HMM分词（实现了HMM分词的参数训练）和CRF分词（借用CRF++训练的参数模型），最后利用tensorFlow把BiLSTM+CRF实现了，然后为lucene包装了一个XinAnalyzer

/ 100

Emerging

This project helps developers integrate advanced Chinese natural language processing directly into their applications. It takes raw Chinese text as input and breaks it down into individual words or meaningful segments, which are then used to power search engines or other text analysis tools. It's primarily used by software developers building applications that require sophisticated Chinese text understanding.

No commits in the last 6 months.

Use this if you are a developer looking to implement robust Chinese word segmentation, especially for search functionalities within applications built on platforms like Lucene.

Not ideal if you are a non-developer seeking an off-the-shelf tool for general Chinese text analysis without coding.

Chinese NLP Text Segmentation Search Engine Development Information Retrieval Software Development

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 8 / 25

Community 17 / 25

How are scores calculated?

Stars

Forks

Language

Java

License

—

Higher-rated alternatives

blmoistawinde/HarvestText

文本挖掘和预处理工具（文本清洗、新词发现、情感分析、实体识别链接、关键词抽取、知识抽取、句法分析等），无监督或弱监督方法

huspacy/huspacy

HuSpaCy: industrial-strength Hungarian natural language processing

bnosac/udpipe

R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based...

BramVanroy/spacy_conll

Pipeline component for spaCy (and other spaCy-wrapped parsers such as spacy-stanza and...

polm/unidic-py

Unidic packaged for installation via pip.

Explore NLP Tools

All categories Trending NLP directory Insights