elasticsearch-jieba-plugin
jieba analysis plugin for elasticsearch: 7.7.0, 7.4.2, 7.3.0, 7.0.0, 6.4.0, 6.0.0, 5.4.0, 5.3.0, 5.2.2, 5.2.1, 5.2.0, 5.1.2, 5.1.1
特点
支持动态添加字典,不重启ES。
简单的修改,即可适配不同版本的ES
支持动态添加字典,ES不需要重启
有关jieba_index和jieba_search的应用
新分词支持
如果是ES6.4.0的版本,请使用6.4.0分支最新的代码,或者master分支最新代码,也可以下载6.4.1的release,强烈推荐升级!
6.4.1的release,解决了PositionIncrement问题。详细说明见ES分词PositionIncrement解析
版本对应
分支
tag
elasticsearch版本
Release Link
7.7.0
tag v7.7.1
v7.7.0
Download: v7.7.0
7.4.2
tag v7.4.2
v7.4.2
Download: v7.4.2
7.3.0
tag v7.3.0
v7.3.0
Download: v7.3.0
7.0.0
tag v7.0.0
v7.0.0
Download: v7.0.0
6.4.0
tag v6.4.1
v6.4.0
Download: v6.4.1
6.4.0
tag v6.4.0
v6.4.0
Download: v6.4.0
6.0.0
tag v6.0.0
v6.0.0
Download: v6.0.1
5.4.0
tag v5.4.0
v5.4.0
Download: v5.4.0
5.3.0
tag v5.3.0
v5.3.0
Download: v5.3.0
5.2.2
tag v5.2.2
v5.2.2
Download: v5.2.2
5.2.1
tag v5.2.1
v5.2.1
Download: v5.2.1
5.2
tag v5.2.0
v5.2.0
Download: v5.2.0
5.1.2
tag v5.1.2
v5.1.2
Download: v5.1.2
5.1.1
tag v5.1.1
v5.1.1
Download: v5.1.1
more details
choose right version source code.
run
git clone https://github.com/sing1ee/elasticsearch-jieba-plugin.git --recursive
./gradlew clean pz
copy the zip file to plugin directory
cp build/distributions/elasticsearch-jieba-plugin-5.1.2.zip ${path.home}/plugins
unzip and rm zip file
unzip elasticsearch-jieba-plugin-5.1.2.zip
rm elasticsearch-jieba-plugin-5.1.2.zip
start elasticsearch
./bin/elasticsearch
Custom User Dict
Just put you dict file with suffix .dict into ${path.home}/plugins/jieba/dic. Your dict file should like this:
小清新 3
百搭 3
显瘦 3
隨身碟 100
your_word word_freq
Using stopwords
find stopwords.txt in ${path.home}/plugins/jieba/dic.
create folder named stopwords under ${path.home}/config
mkdir -p {path.home}/config/stopwords
copy stopwords.txt into the folder just created
cp ${path.home}/plugins/jieba/dic/stopwords.txt {path.home}/config/stopwords
create index:
PUT http://localhost:9200/jieba_index
{
"settings": {
"analysis": {
"filter": {
"jieba_stop": {
"type": "stop",
"stopwords_path": "stopwords/stopwords.txt"
},
"jieba_synonym": {
"type": "synonym",
"synonyms_path": "synonyms/synonyms.txt"
}
},
"analyzer": {
"my_ana": {
"tokenizer": "jieba_index",
"filter": [
"lowercase",
"jieba_stop",
"jieba_synonym"
]
}
}
}
}
}
test analyzer:
PUT http://localhost:9200/jieba_index/_analyze
{
"analyzer" : "my_ana",
"text" : "黄河之水天上来"
}
Response as follow:
{
"tokens": [
{
"token": "黄河",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 0
},
{
"token": "黄河之水天上来",
"start_offset": 0,
"end_offset": 7,
"type": "word",
"position": 0
},
{
"token": "之水",
"start_offset": 2,
"end_offset": 4,
"type": "word",
"position": 1
},
{
"token": "天上",
"start_offset": 4,
"end_offset": 6,
"type": "word",
"position": 2
},
{
"token": "上来",
"start_offset": 5,
"end_offset": 7,
"type": "word",
"position": 2
}
]
}
NOTE
migrate from jieba-solr
Roadmap
I will add more analyzer support:
stanford chinese analyzer
fudan nlp analyzer
...
If you have some ideas, you should create an issue. Then, we will do it together.