"nGram_filter": {
"type": "nGram",
"min_gram": 3,
"max_gram": 10,
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
}
"nGram_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding",
"nGram_filter"
]
}
但是我需要添加另一个特性--我需要启用前缀过滤器。例如:当我搜索test_table(10个字符)时,我能够得到结果,因为最大n-gram是10,但是当我尝试test_table_for时,它返回零结果,因为记录test_table_for analyzers
没有这个标记。
我怎样才能添加一个基于前缀的过滤器也为现有的n-gram分析器?就像我应该能够得到匹配的结果最多10个字符时搜索(目前工作),而且我应该能够建议什么时候搜索字符串匹配记录从开始以及。
这在单个分析器中是不可能的,您必须创建另一个字段,在该字段中可以创建用于prefix
搜索的edge_ngram标记,添加索引映射,显示其中也包括您当前的分析器。
索引映射
{
"settings": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 30
},
"nGram_filter": {
"type": "nGram",
"min_gram": 3,
"max_gram": 10,
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
}
},
"analyzer": {
"prefixanalyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
},
"ngramanalyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"nGram_filter"
]
}
}
},
"index.max_ngram_diff" : 30
},
"mappings": {
"properties": {
"title_prefix": {
"type": "text",
"analyzer": "prefixanalyzer",
"search_analyzer": "standard"
},
"title" :{
"type": "text",
"analyzer": "ngramanalyzer",
"search_analyzer": "standard"
}
}
}
}
现在可以使用useanalyse
API来确认前缀标记:
{
"analyzer": "prefixanalyzer",
"text" : "test_table_for analyzers"
}
{"tokens":[{"token":"t","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"te","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"tes","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test_","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test_t","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test_ta","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test_tab","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test_tabl","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test_table","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test_table_","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test_table_f","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test_table_fo","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"test_table_for","start_offset":0,"end_offset":14,"type":"<ALPHANUM>","position":0},{"token":"a","start_offset":15,"end_offset":24,"type":"<ALPHANUM>","position":1},{"token":"an","start_offset":15,"end_offset":24,"type":"<ALPHANUM>","position":1},{"token":"ana","start_offset":15,"end_offset":24,"type":"<ALPHANUM>","position":1},{"token":"anal","start_offset":15,"end_offset":24,"type":"<ALPHANUM>","position":1},{"token":"analy","start_offset":15,"end_offset":24,"type":"<ALPHANUM>","position":1},{"token":"analyz","start_offset":15,"end_offset":24,"type":"<ALPHANUM>","position":1},{"token":"analyze","start_offset":15,"end_offset":24,"type":"<ALPHANUM>","position":1},{"token":"analyzer","start_offset":15,"end_offset":24,"type":"<ALPHANUM>","position":1},{"token":"analyzers","start_offset":15,"end_offset":24,"type":"<ALPHANUM>","position":1}]}
{
"query": {
"multi_match": {
"query": "test_table_for",
"fields": [
"title",
"title_prefix"
]
}
}
}
"hits": [
{
"_index": "so_63981157",
"_type": "_doc",
"_id": "1",
"_score": 0.45920232,
"_source": {
"title_prefix": "test_table_for analyzers",
"title": "test_table_for analyzers"
}
}
]
问题内容: 是否可以在JPA / Hibernate中覆盖表名称,以便为所有项目实体添加通用前缀?例如,能够通过“ JBPM5_”前缀为所有JBPM 5表添加前缀。 可接受答案的示例: 问题答案: 一次重命名所有表的一种方法是实现自己的namingStrategy(的实现)。 使用的NamingStrategy在persistence.xml中由
我有一个标记化的文本(拆分的句子和拆分的单词)。并将基于此结构创建Apache Lucene索引。什么是最简单的方法来扩展或替换一个standart标记器使用自定义标记。我正在查看StandardTokenizerImpl,但似乎非常复杂。可能还有别的办法吗?
我是pylucene的新手,我试图构建一个自定义分析器,它只在下划线的基础上标记文本,即它应该保留空白空间。示例:“hi_this is_awesome”应该标记为[“hi”,“this is”,“awesome”]标记。 从各种代码示例中,我了解到需要重写CustomTokenizer的incrementToken方法,并编写CustomAnalyzer,TokenStream需要使用Custo
我正在使用自定义的NGRAM分析器,它有一个NGRAM标记器。我也用过小写过滤器。对于没有字符的搜索,该查询运行良好。但是当我搜索某些符号时,它失败了。由于我使用了小写标记器,Elasticsearch不分析符号。我知道空白标记器可以帮助我解决这个问题。如何在一个分析器中使用两个标记器?下面是映射: 我有办法解决这个问题吗?
问题内容: 我有这段代码可以绘制图形,效果很好。我需要两件事 在域轴(x)上,我希望能够滚动。在标记上,我看到粗线。我希望能够看到此标记的一些可读文本。现在我看到这个输出 同样在域轴上,我有毫值。我可以将其映射到人类可读的日期吗? 问题答案: 您必须结合几种方法: 域滚动替代方案: 尝试一个SlidingXYDataset,在此实现并在此处说明。 启用平移,例如plot.setDomainPann
我正在使用React视图上的Highcharts。出于可访问性和搜索引擎优化的原因,我想定制标记,但我在文档中找不出如何实现。Highcharts会自动插入一个标记,如下所示: <代码> 有没有办法改变这一点,或者这是硬编码的?