Elasticsearch：查找子字符串匹配

晋承嗣

2023-03-14

问题内容：

我想同时执行完全的单词匹配和部分的单词/子字符串匹配。例如，如果我搜索“男士剃须刀”，那么我应该能够在结果中找到“男士剃须刀”。但是，如果我搜索“剃须刀”，那么在结果中我也应该能够找到“剃须刀”。我使用以下设置和映射：

索引设置：

PUT /my_index
{
    "settings": {
        "number_of_shards": 1, 
        "analysis": {
            "filter": {
                "autocomplete_filter": { 
                    "type":     "edge_ngram",
                    "min_gram": 1,
                    "max_gram": 20
                }
            },
            "analyzer": {
                "autocomplete": {
                    "type":      "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "autocomplete_filter" 
                    ]
                }
            }
        }
    }
}

对应：

PUT /my_index/my_type/_mapping
{
    "my_type": {
        "properties": {
            "name": {
                "type":            "string",
                "index_analyzer":  "autocomplete", 
                "search_analyzer": "standard" 
            }
        }
    }
}

插入记录：

POST /my_index/my_type/_bulk
{ "index": { "_id": 1            }}
{ "name": "men's shaver" }
{ "index": { "_id": 2            }}
{ "name": "women's shaver" }

查询：

1.按完全匹配的词组进行搜索- >“男式”

POST /my_index/my_type/_search
{
    "query": {
        "match": {
            "name": "men's"
        }
    }
}

上面的查询在返回结果中返回“男士剃须刀”。

2.按部分单词匹配搜索- >“ en’s”

POST /my_index/my_type/_search
{
    "query": {
        "match": {
            "name": "en's"
        }
    }
}

上面的查询不返回任何内容。

我也尝试过以下查询

POST /my_index/my_type/_search
{
    "query": {
        "wildcard": {
           "name": {
              "value": "%en's%"
           }
        }
    }
}

仍然什么也没得到。我发现这是因为Index上的“ edge_ngram”类型过滤器无法找到“部分单词/字符串匹配”。我也尝试过“
n-gram”类型的过滤器，但是它大大降低了搜索速度。

请建议我如何使用相同的索引设置同时实现精确短语匹配和部分短语匹配。

问题答案：

要搜索部分字段匹配和完全匹配，如果将字段定义为“未分析”或关键字（而不是文本），然后使用 通配符查询 ，则效果更好。

要使用通配符查询，请在要搜索的字符串的两端添加*：

POST /my_index/my_type/_search
{
"query": {
    "wildcard": {
       "name": {
          "value": "*en's*"
       }
    }
}
}

若要 不区分大小写 使用，请使用带有小 写过滤器和关键字标记器 的自定义分析器。

自定义分析器：

"custom_analyzer": {
            "tokenizer": "keyword",
            "filter": ["lowercase"]
        }

使搜索字符串小写

如果您将搜索字符串转换为 AsD* ：将其更改为 asd ***

Elasticsearch：查找子字符串匹配

相关阅读

相关文章

相关问答

相关工具

相关文档