我在ES 5.1中为数据建立索引时遇到的第一个错误是完成建议映射,其中包含一个输出字段。
message [MapperParsingException[failed to parse]; nested: IllegalArgumentException[unknown field name [output], must be one of [input, weight, contexts]];]
所以我删除了它,但是现在我的许多自动补全都不正确,因为它返回匹配的输入而不是单个输出String。
经过一番谷歌搜索后,我发现ES中的这篇文章提到了以下内容:
由于建议是面向文档的,因此建议元数据(例如输出)现在应指定为文档中的字段。删除了对建立索引建议条目时指定输出的支持。现在,建议结果条目的文本始终是建议输入的未分析值(与在5.0之前的索引中为建议建立索引时未指定输出相同)。
我发现原始值与随建议返回的_source字段一起使用,但是对我来说这并不是真正的解决方案,因为键和结构会根据其原始对象而变化。
我可以在原始对象上添加一个额外的“输出”字段,但这对我来说也不是解决方案,因为在某些情况下,我具有这样的结构:
{
"id": "c2358e0c-7399-4665-ac2c-0bdd44597ac0",
"synonyms": ["All available colours", "Colors"],
"autoComplete": [{
"input": ["colours available all", "available colours all", "available all colours", "colours all available", "all available colours", "all colours available"]
}, {
"input": ["colors"]
}]
}
在ES 2.4中,结构如下:
{
"id": "c2358e0c-7399-4665-ac2c-0bdd44597ac0",
"synonyms": ["All available colours", "Colors"],
"SmartSynonym": [{
"input": ["colours available all", "available colours all", "available all colours", "colours all available", "all available colours", "all colours available"],
"output": ["All available colours"]
}, {
"input": ["colors"],
"output": ["Colors"]
}]
}
当每个自动完成对象中都存在’输出’字段时,这没有任何问题。
当以简单的方式询问“所有可用颜色”时,如何进行ES 5.1(例如,所有可用颜色)的原始值,而无需进行大量手动查找。
我们最终从原始答案中删除了自定义插件,因为很难使其在Elastic
Cloud中正常工作
。相反,我们只是为自动填充创建了一个单独的文档,并将其从所有其他文档中删除了。
物体
public class Suggest{
/*
* Contains the actual value it needs to return
* iphone 8 plus, plus iphone 8, 8 plus iphone, ...
* will all result into iphone 8 plus for example
*/
private String autocompleteOutput;
/*
* Contains the field and all the values of that field to autocomplete
*/
private Map<String, AutoComplete> autoComplete;
@JsonCreator
Suggest() {
}
public Suggest(String autocompleteOutput, Map<String, AutoComplete> autoComplete) {
this.autocompleteOutput = autocompleteOutput;
this.autoComplete = autoComplete;
}
public String getAutocompleteOutput() {
return autocompleteOutput;
}
public void setAutocompleteOutput(String autocompleteOutput) {
this.autocompleteOutput = autocompleteOutput;
}
public Map<String, AutoComplete> getAutoComplete() {
return autoComplete;
}
public void setAutoComplete(Map<String, AutoComplete> autoComplete) {
this.autoComplete = autoComplete;
}
}
public class AutoComplete {
/*
* Contains the permutation values from the lucene filter (see original answer
*/
private String[] input;
@JsonCreator
AutoComplete() {
}
public AutoComplete(String[] input) {
this.input = input;
}
public String[] getInput() {
return input;
}
}
具有以下映射
{
"suggest": {
"dynamic_templates": [
{
"autocomplete": {
"path_match": "autoComplete.*",
"match_mapping_type": "*",
"mapping": {
"type": "completion",
"analyzer": "lowercase_keyword_analyzer"
}
}
}
],
"properties": {}
}
}
这使我们可以使用_source中的autocompleteOutput字段
经过一番研究,我最终创建了一个新的Elasticsearch 5.1.1插件
import org.apache.lucene.analysis.TokenFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;
import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
import org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute;
import java.io.IOException;
import java.util.*;
/**
* Created by glenn on 13.01.17.
*/
public class PermutationTokenFilter extends TokenFilter {
private final CharTermAttribute charTermAtt;
private final PositionIncrementAttribute posIncrAtt;
private final OffsetAttribute offsetAtt;
private Iterator<String> permutations;
private int origOffset;
/**
* Construct a token stream filtering the given input.
*
* @param input
*/
protected PermutationTokenFilter(TokenStream input) {
super(input);
this.charTermAtt = addAttribute(CharTermAttribute.class);
this.posIncrAtt = addAttribute(PositionIncrementAttribute.class);
this.offsetAtt = addAttribute(OffsetAttribute.class);
}
@Override
public final boolean incrementToken() throws IOException {
while (true) {
//see if permutations have been created already
if (permutations == null) {
//see if more tokens are available
if (!input.incrementToken()) {
return false;
} else {
//Get value
String value = String.valueOf(charTermAtt);
//permute over buffer value and create iterator
permutations = permutation(value).iterator();
origOffset = posIncrAtt.getPositionIncrement();
}
}
//see if there are remaining permutations
if (permutations.hasNext()) {
//Reset the attribute to starting point
clearAttributes();
//use the next permutation
String permutation = permutations.next();
//add te permutation to the attributes and remove old attributes
charTermAtt.setEmpty().append(permutation);
posIncrAtt.setPositionIncrement(origOffset);
offsetAtt.setOffset(0,permutation.length());
//remove permutation from iterator
permutations.remove();
origOffset = 0;
return true;
}
permutations = null;
}
}
/**
* Changes the order of a multi value keyword so the completion suggester still knows the original value without
* tokenizing it if the users asks the words in a different order.
*
* @param value unpermuted value ex: Yellow Crazy Banana
* @return Permuted values ex:
* Yellow Crazy Banana,
* Yellow Banana Crazy,
* Crazy Yellow Banana,
* Crazy Banana Yellow,
* Banana Crazy Yellow,
* Banana Yellow Crazy
*/
private Set<String> permutation(String value) {
value = value.trim().replaceAll(" +", " ");
// Use sets to eliminate semantic duplicates (a a b is still a a b even if you switch the two 'a's in case one word occurs multiple times in a single value)
// Switch to HashSet for better performance
Set<String> set = new HashSet<String>();
String[] words = value.split(" ");
// Termination condition: only 1 permutation for a array of 1 word
if (words.length == 1) {
set.add(value);
} else if (words.length <= 6) {
// Give each word a chance to be the first in the permuted array
for (int i = 0; i < words.length; i++) {
// Remove the word at index i from the array
String pre = "";
for (int j = 0; j < i; j++) {
pre += words[j] + " ";
}
String post = " ";
for (int j = i + 1; j < words.length; j++) {
post += words[j] + " ";
}
String remaining = (pre + post).trim();
// Recurse to find all the permutations of the remaining words
for (String permutation : permutation(remaining)) {
// Concatenate the first word with the permutations of the remaining words
set.add(words[i] + " " + permutation);
}
}
} else {
Collections.addAll(set, words);
set.add(value);
}
return set;
}
}
该过滤器将采用原始输入令牌“所有可用颜色”并将其置换为所有可能的组合(请参阅原始问题)
import org.apache.lucene.analysis.TokenStream;
import org.elasticsearch.index.analysis.AbstractTokenFilterFactory;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.env.Environment;
import org.elasticsearch.index.IndexSettings;
/**
* Created by glenn on 16.01.17.
*/
public class PermutationTokenFilterFactory extends AbstractTokenFilterFactory {
public PermutationTokenFilterFactory(IndexSettings indexSettings, Environment environment, String name, Settings settings) {
super(indexSettings, name, settings);
}
public PermutationTokenFilter create(TokenStream input) {
return new PermutationTokenFilter(input);
}
}
需要此类来为Elasticsearch插件提供过滤器。
请遵循本指南为Elasticsearch插件设置所需的配置。
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>be.smartspoken</groupId>
<artifactId>permutation-plugin</artifactId>
<version>5.1.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>Plugin: Permutation</name>
<description>Permutation plugin for elasticsearch</description>
<properties>
<lucene.version>6.3.0</lucene.version>
<elasticsearch.version>5.1.1</elasticsearch.version>
<java.version>1.8</java.version>
<log4j2.version>2.7</log4j2.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>${log4j2.version}</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>${log4j2.version}</version>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-test-framework</artifactId>
<version>${lucene.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-core</artifactId>
<version>${lucene.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.lucene</groupId>
<artifactId>lucene-analyzers-common</artifactId>
<version>${lucene.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>${elasticsearch.version}</version>
<scope>provided</scope>
</dependency>
</dependencies>
<build>
<resources>
<resource>
<directory>src/main/resources</directory>
<filtering>false</filtering>
<excludes>
<exclude>*.properties</exclude>
</excludes>
</resource>
</resources>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.6</version>
<configuration>
<appendAssemblyId>false</appendAssemblyId>
<outputDirectory>${project.build.directory}/releases/</outputDirectory>
<descriptors>
<descriptor>${basedir}/src/main/assemblies/plugin.xml</descriptor>
</descriptors>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.3</version>
<configuration>
<source>${java.version}</source>
<target>${java.version}</target>
</configuration>
</plugin>
</plugins>
</build>
</project>
确保在pom.xml文件中使用正确的Elasticsearch,Lucene和Log4J(2)版本。并提供正确的配置文件
import be.smartspoken.plugin.permutation.filter.PermutationTokenFilterFactory;
import org.elasticsearch.index.analysis.TokenFilterFactory;
import org.elasticsearch.indices.analysis.AnalysisModule;
import org.elasticsearch.plugins.AnalysisPlugin;
import org.elasticsearch.plugins.Plugin;
import java.util.HashMap;
import java.util.Map;
/**
* Created by glenn on 13.01.17.
*/
public class PermutationPlugin extends Plugin implements AnalysisPlugin{
@Override
public Map<String, AnalysisModule.AnalysisProvider<TokenFilterFactory>> getTokenFilters() {
Map<String, AnalysisModule.AnalysisProvider<TokenFilterFactory>> extra = new HashMap<>();
extra.put("permutation", PermutationTokenFilterFactory::new);
return extra;
}
}
向插件提供工厂。
安装新插件后,您需要重新启动Elasticsearch。
添加一个新的自定义分析器,以“修改” 2.x的功能
Settings.builder()
.put("number_of_shards", 2)
.loadFromSource(jsonBuilder()
.startObject()
.startObject("analysis")
.startObject("analyzer")
.startObject("permutation_analyzer")
.field("tokenizer", "keyword")
.field("filter", new String[]{"permutation","lowercase"})
.endObject()
.endObject()
.endObject()
.endObject().string())
.loadFromSource(jsonBuilder()
.startObject()
.startObject("analysis")
.startObject("analyzer")
.startObject("lowercase_keyword_analyzer")
.field("tokenizer", "keyword")
.field("filter", new String[]{"lowercase"})
.endObject()
.endObject()
.endObject()
.endObject().string())
.build();
现在,您要做的就是为对象映射提供自定义分析器
{
"my_object": {
"dynamic_templates": [{
"autocomplete": {
"path_match": "my.autocomplete.object.path",
"match_mapping_type": "*",
"mapping": {
"type": "completion",
"analyzer": "permutation_analyzer", /* custom analyzer */
"search_analyzer": "lowercase_keyword_analyzer" /* custom analyzer */
}
}
}],
"properties": {
/*your other properties*/
}
}
}
这也将提高性能,因为您不必再等待构建排列了。
我想从多个领域得到建议。我找不到这样的例子,所以也许这不是最好的主意,但我对你的意见很感兴趣。 要求是: GET查询适用于文本“fyodor”和“dostoevsky”,此示例仅适用于“fyodor” 启用筛选建议 我有什么想法可以实现这些?
问题内容: 我在Elasticsearch中使用“完成建议程序”来允许部分单词匹配查询。在我的索引(products_index)中,我希望能够同时查询 product_name 字段和 brand 字段。这是我的映射: 这是我的数据: 这是我的查询: 效果很好,除了我想给 product_name 字段赋予比 brand 字段更高的权重。有什么办法可以实现?我已经研究了有关使用 布尔 查询的文章
让我们假设我有一个酒店索引,就像ElesticSearch网站上的示例一样。除了得到与给定短语匹配的酒店之外,我还想检查用户是否被允许看到“建议”。像只为某一客户服务的旅馆之类的东西。我添加了一个用于保存组/权限的新字段,作为一个简单的字符串用于测试。我想完成的是按组过滤。 映射如下所示: 并附上下列文件: 在https://github.com/elasticsearch/elasticsear
问题内容: 使用Python,如何将字段提取为变量?基本上,我将其转换为: 像 问题答案: 假设您将该字典存储在一个名为值的变量中。要进入变量,请执行以下操作: 如果该json位于文件中,请执行以下操作以加载它: 如果该json来自URL,请执行以下操作以加载它: 要打印所有条件,您可以:
问题内容: 我从未见过 在任何地方实际使用过HTML标记。使用它有陷阱吗,这意味着我应该避免使用它? 我从来没有注意到它在现代生产站点(或任何站点)上使用过的事实,使我对此感到不安,尽管它似乎具有简化我站点上链接的有用应用程序。 编辑 在使用基本标签几周后,我确实发现使用基本标签会引起一些重大麻烦,这使它比最初出现的效果要差得多。从本质上讲,变化和基本标签下是_非常_有它们的默认行为不兼容,并可以
问题内容: 我在我的项目中使用。现在已弃用,我想知道什么是替代品?如何获取语音通话事件? 问题答案: 这没有很好的文档记录,但是我在公共头文件中找到了此提及: “替换为” 因此,从iOS 10开始,您应该使用新框架类来检索有关活动调用的信息: 提供您要在其上执行委托回调的符合协议和队列的对象: 并在委托对象上实现以下方法: 有关更多信息,您可以检查: 通过WWDC 2016的CallKit会话增强