1000-7/xinlp
把李航老师《统计学习方法》的后几章的算法都用java实现了一遍,实现盒子与球的EM算法,扩展到去GMM训练,后来实现了HMM分词(实现了HMM分词的参数训练)和CRF分词(借用CRF++训练的参数模型),最后利用tensorFlow把BiLSTM+CRF实现了,然后为lucene包装了一个XinAnalyzer
This project helps developers integrate advanced Chinese natural language processing directly into their applications. It takes raw Chinese text as input and breaks it down into individual words or meaningful segments, which are then used to power search engines or other text analysis tools. It's primarily used by software developers building applications that require sophisticated Chinese text understanding.
No commits in the last 6 months.
Use this if you are a developer looking to implement robust Chinese word segmentation, especially for search functionalities within applications built on platforms like Lucene.
Not ideal if you are a non-developer seeking an off-the-shelf tool for general Chinese text analysis without coding.
Stars
23
Forks
11
Language
Java
License
—
Category
Last pushed
Jun 17, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/1000-7/xinlp"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
blmoistawinde/HarvestText
文本挖掘和预处理工具(文本清洗、新词发现、情感分析、实体识别链接、关键词抽取、知识抽取、句法分析等),无监督或弱监督方法
huspacy/huspacy
HuSpaCy: industrial-strength Hungarian natural language processing
bnosac/udpipe
R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based...
BramVanroy/spacy_conll
Pipeline component for spaCy (and other spaCy-wrapped parsers such as spacy-stanza and...
polm/unidic-py
Unidic packaged for installation via pip.