当前位置: 首页 > 面试题库 >

对于Elasticsearch 5.1完成建议中的输出字段,有什么好的选择?

东郭存
2023-03-14
问题内容

我在ES 5.1中为数据建立索引时遇到的第一个错误是完成建议映射,其中包含一个输出字段。

message [MapperParsingException[failed to parse]; nested: IllegalArgumentException[unknown field name [output], must be one of [input, weight, contexts]];]

所以我删除了它,但是现在我的许多自动补全都不正确,因为它返回匹配的输入而不是单个输出String。

经过一番谷歌搜索后,我发现ES中的这篇文章提到了以下内容:

由于建议是面向文档的,因此建议元数据(例如输出)现在应指定为文档中的字段。删除了对建立索引建议条目时指定输出的支持。现在,建议结果条目的文本始终是建议输入的未分析值(与在5.0之前的索引中为建议建立索引时未指定输出相同)。

我发现原始值与随建议返回的_source字段一起使用,但是对我来说这并不是真正的解决方案,因为键和结构会根据其原始对象而变化。

我可以在原始对象上添加一个额外的“输出”字段,但这对我来说也不是解决方案,因为在某些情况下,我具有这样的结构:

{
    "id": "c2358e0c-7399-4665-ac2c-0bdd44597ac0",
    "synonyms": ["All available colours", "Colors"],
    "autoComplete": [{
        "input": ["colours available all", "available colours all", "available all colours", "colours all available", "all available colours", "all colours available"]
    }, {
        "input": ["colors"]
    }]
}

在ES 2.4中,结构如下:

{
    "id": "c2358e0c-7399-4665-ac2c-0bdd44597ac0",
    "synonyms": ["All available colours", "Colors"],
    "SmartSynonym": [{
        "input": ["colours available all", "available colours all", "available all colours", "colours all available", "all available colours", "all colours available"],
        "output": ["All available colours"]
    }, {
        "input": ["colors"],
        "output": ["Colors"]
    }]
    }

当每个自动完成对象中都存在’输出’字段时,这没有任何问题。

当以简单的方式询问“所有可用颜色”时,如何进行ES 5.1(例如,所有可用颜色)的原始值,而无需进行大量手动查找。


问题答案:

我们最终从原始答案中删除了自定义插件,因为很难使其在Elastic
Cloud中正常工作
。相反,我们只是为自动填充创建了一个单独的文档,并将其从所有其他文档中删除了。

物体

public class Suggest{
    /*
     * Contains the actual value it needs to return
     * iphone 8 plus, plus iphone 8, 8 plus iphone, ...
     * will all result into iphone 8 plus for example
     */
    private String autocompleteOutput;
    /*
     * Contains the field and all the values of that field to autocomplete
     */
    private Map<String, AutoComplete> autoComplete;

    @JsonCreator
    Suggest() {
    }

    public Suggest(String autocompleteOutput, Map<String, AutoComplete> autoComplete) {
        this.autocompleteOutput = autocompleteOutput;
        this.autoComplete = autoComplete;
    }

    public String getAutocompleteOutput() {
        return autocompleteOutput;
    }

    public void setAutocompleteOutput(String autocompleteOutput) {
        this.autocompleteOutput = autocompleteOutput;
    }

    public Map<String, AutoComplete> getAutoComplete() {
        return autoComplete;
    }

    public void setAutoComplete(Map<String, AutoComplete> autoComplete) {
        this.autoComplete = autoComplete;
    }
}

public class AutoComplete {
    /*
     * Contains the permutation values from the lucene filter (see original answer
     */
    private String[] input;

    @JsonCreator
    AutoComplete() {
    }

    public AutoComplete(String[] input) {
        this.input = input;
    }

    public String[] getInput() {
        return input;
    }
}

具有以下映射

{
  "suggest": {
    "dynamic_templates": [
      {
        "autocomplete": {
          "path_match": "autoComplete.*",
          "match_mapping_type": "*",
          "mapping": {
            "type": "completion",
            "analyzer": "lowercase_keyword_analyzer"
          }
        }
      }
    ],
    "properties": {}
  }
}

这使我们可以使用_source中的autocompleteOutput字段

原始答案

经过一番研究,我最终创建了一个新的Elasticsearch 5.1.1插件

创建一个lucene过滤器

import org.apache.lucene.analysis.TokenFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;
import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
import org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute;

import java.io.IOException;
import java.util.*;

/**
 * Created by glenn on 13.01.17.
 */
public class PermutationTokenFilter extends TokenFilter {
    private final CharTermAttribute charTermAtt;
    private final PositionIncrementAttribute posIncrAtt;
    private final OffsetAttribute offsetAtt;
    private Iterator<String> permutations;
    private int origOffset;

    /**
     * Construct a token stream filtering the given input.
     *
     * @param input
     */
    protected PermutationTokenFilter(TokenStream input) {
        super(input);
        this.charTermAtt = addAttribute(CharTermAttribute.class);
        this.posIncrAtt = addAttribute(PositionIncrementAttribute.class);
        this.offsetAtt = addAttribute(OffsetAttribute.class);
    }

    @Override
    public final boolean incrementToken() throws IOException {
        while (true) {
            //see if permutations have been created already
            if (permutations == null) {
                //see if more tokens are available
                if (!input.incrementToken()) {
                    return false;
                } else {
                    //Get value
                    String value = String.valueOf(charTermAtt);
                    //permute over buffer value and create iterator
                    permutations = permutation(value).iterator();
                    origOffset = posIncrAtt.getPositionIncrement();
                }
            }
            //see if there are remaining permutations
            if (permutations.hasNext()) {
                //Reset the attribute to starting point
                clearAttributes();
                //use the next permutation
                String permutation = permutations.next();
                //add te permutation to the attributes and remove old attributes
                charTermAtt.setEmpty().append(permutation);
                posIncrAtt.setPositionIncrement(origOffset);
                offsetAtt.setOffset(0,permutation.length());
                //remove permutation from iterator
                permutations.remove();
                origOffset = 0;
                return true;
            }
            permutations = null;
        }
    }

    /**
     * Changes the order of a multi value keyword so the completion suggester still knows the original value without
     * tokenizing it if the users asks the words in a different order.
     *
     * @param value unpermuted value ex: Yellow Crazy Banana
     * @return Permuted values ex:
     * Yellow Crazy Banana,
     * Yellow Banana Crazy,
     * Crazy Yellow Banana,
     * Crazy Banana Yellow,
     * Banana Crazy Yellow,
     * Banana Yellow Crazy
     */
    private Set<String> permutation(String value) {
        value = value.trim().replaceAll(" +", " ");
        // Use sets to eliminate semantic duplicates (a a b is still a a b even if you switch the two 'a's in case one word occurs multiple times in a single value)
        // Switch to HashSet for better performance
        Set<String> set = new HashSet<String>();
        String[] words = value.split(" ");
        // Termination condition: only 1 permutation for a array of 1 word
        if (words.length == 1) {
            set.add(value);
        } else if (words.length <= 6) {
            // Give each word a chance to be the first in the permuted array
            for (int i = 0; i < words.length; i++) {
                // Remove the word at index i from the array
                String pre = "";
                for (int j = 0; j < i; j++) {
                    pre += words[j] + " ";
                }

                String post = " ";
                for (int j = i + 1; j < words.length; j++) {
                    post += words[j] + " ";
                }
                String remaining = (pre + post).trim();

                // Recurse to find all the permutations of the remaining words
                for (String permutation : permutation(remaining)) {
                    // Concatenate the first word with the permutations of the remaining words
                    set.add(words[i] + " " + permutation);
                }
            }
        } else {
            Collections.addAll(set, words);
            set.add(value);
        }
        return set;
    }
}

该过滤器将采用原始输入令牌“所有可用颜色”并将其置换为所有可能的组合(请参阅原始问题)

创建工厂

import org.apache.lucene.analysis.TokenStream;
import org.elasticsearch.index.analysis.AbstractTokenFilterFactory;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.env.Environment;
import org.elasticsearch.index.IndexSettings;


/**
 * Created by glenn on 16.01.17.
 */
public class PermutationTokenFilterFactory extends AbstractTokenFilterFactory {

    public PermutationTokenFilterFactory(IndexSettings indexSettings, Environment environment, String name, Settings settings) {
        super(indexSettings, name, settings);
    }

    public PermutationTokenFilter create(TokenStream input) {
        return new PermutationTokenFilter(input);
    }
}

需要此类来为Elasticsearch插件提供过滤器。

创建Elasticsearch插件

请遵循本指南为Elasticsearch插件设置所需的配置。

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>be.smartspoken</groupId>
    <artifactId>permutation-plugin</artifactId>
    <version>5.1.1-SNAPSHOT</version>
    <packaging>jar</packaging>
    <name>Plugin: Permutation</name>
    <description>Permutation plugin for elasticsearch</description>
    <properties>
        <lucene.version>6.3.0</lucene.version>
        <elasticsearch.version>5.1.1</elasticsearch.version>
        <java.version>1.8</java.version>
        <log4j2.version>2.7</log4j2.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-api</artifactId>
            <version>${log4j2.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-core</artifactId>
            <version>${log4j2.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-test-framework</artifactId>
            <version>${lucene.version}</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-core</artifactId>
            <version>${lucene.version}</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-analyzers-common</artifactId>
            <version>${lucene.version}</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch</artifactId>
            <version>${elasticsearch.version}</version>
            <scope>provided</scope>
        </dependency>
    </dependencies>

    <build>
        <resources>
            <resource>
                <directory>src/main/resources</directory>
                <filtering>false</filtering>
                <excludes>
                    <exclude>*.properties</exclude>
                </excludes>
            </resource>
        </resources>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>2.6</version>
                <configuration>
                    <appendAssemblyId>false</appendAssemblyId>
                    <outputDirectory>${project.build.directory}/releases/</outputDirectory>
                    <descriptors>
                        <descriptor>${basedir}/src/main/assemblies/plugin.xml</descriptor>
                    </descriptors>
                </configuration>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.3</version>
                <configuration>
                    <source>${java.version}</source>
                    <target>${java.version}</target>
                </configuration>
            </plugin>
        </plugins>
    </build>

</project>

确保在pom.xml文件中使用正确的Elasticsearch,Lucene和Log4J(2)版本。并提供正确的配置文件

import be.smartspoken.plugin.permutation.filter.PermutationTokenFilterFactory;
import org.elasticsearch.index.analysis.TokenFilterFactory;
import org.elasticsearch.indices.analysis.AnalysisModule;
import org.elasticsearch.plugins.AnalysisPlugin;
import org.elasticsearch.plugins.Plugin;

import java.util.HashMap;
import java.util.Map;

/**
 * Created by glenn on 13.01.17.
 */
public class PermutationPlugin extends Plugin implements AnalysisPlugin{

    @Override
    public Map<String, AnalysisModule.AnalysisProvider<TokenFilterFactory>> getTokenFilters() {
        Map<String, AnalysisModule.AnalysisProvider<TokenFilterFactory>> extra = new HashMap<>();
        extra.put("permutation", PermutationTokenFilterFactory::new);
        return extra;
    }
}

向插件提供工厂。

安装新插件后,您需要重新启动Elasticsearch。

使用插件

添加一个新的自定义分析器,以“修改” 2.x的功能

            Settings.builder()
                .put("number_of_shards", 2)
                .loadFromSource(jsonBuilder()
                        .startObject()
                            .startObject("analysis")
                                .startObject("analyzer")
                                    .startObject("permutation_analyzer")
                                        .field("tokenizer", "keyword")
                                        .field("filter", new String[]{"permutation","lowercase"})
                                    .endObject()
                                .endObject()
                            .endObject()
                        .endObject().string())
                .loadFromSource(jsonBuilder()
                        .startObject()
                            .startObject("analysis")
                                .startObject("analyzer")
                                    .startObject("lowercase_keyword_analyzer")
                                        .field("tokenizer", "keyword")
                                        .field("filter", new String[]{"lowercase"})
                                    .endObject()
                                .endObject()
                            .endObject()
                        .endObject().string())
                .build();

现在,您要做的就是为对象映射提供自定义分析器

{
    "my_object": {
        "dynamic_templates": [{
            "autocomplete": {
                "path_match": "my.autocomplete.object.path",
                "match_mapping_type": "*",
                "mapping": {
                    "type": "completion",
                    "analyzer": "permutation_analyzer", /* custom analyzer */
                    "search_analyzer": "lowercase_keyword_analyzer" /* custom analyzer */
                }
            }
        }],
        "properties": {
            /*your other properties*/
        }
    }
}

这也将提高性能,因为您不必再​​等待构建排列了。



 类似资料:
  • 我想从多个领域得到建议。我找不到这样的例子,所以也许这不是最好的主意,但我对你的意见很感兴趣。 要求是: GET查询适用于文本“fyodor”和“dostoevsky”,此示例仅适用于“fyodor” 启用筛选建议 我有什么想法可以实现这些?

  • 问题内容: 我在Elasticsearch中使用“完成建议程序”来允许部分单词匹配查询。在我的索引(products_index)中,我希望能够同时查询 product_name 字段和 brand 字段。这是我的映射: 这是我的数据: 这是我的查询: 效果很好,除了我想给 product_name 字段赋予比 brand 字段更高的权重。有什么办法可以实现?我已经研究了有关使用 布尔 查询的文章

  • 让我们假设我有一个酒店索引,就像ElesticSearch网站上的示例一样。除了得到与给定短语匹配的酒店之外,我还想检查用户是否被允许看到“建议”。像只为某一客户服务的旅馆之类的东西。我添加了一个用于保存组/权限的新字段,作为一个简单的字符串用于测试。我想完成的是按组过滤。 映射如下所示: 并附上下列文件: 在https://github.com/elasticsearch/elasticsear

  • 问题内容: 使用Python,如何将字段提取为变量?基本上,我将其转换为: 像 问题答案: 假设您将该字典存储在一个名为值的变量中。要进入变量,请执行以下操作: 如果该json位于文件中,请执行以下操作以加载它: 如果该json来自URL,请执行以下操作以加载它: 要打印所有条件,您可以:

  • 问题内容: 我从未见过 在任何地方实际使用过HTML标记。使用它有陷阱吗,这意味着我应该避免使用它? 我从来没有注意到它在现代生产站点(或任何站点)上使用过的事实,使我对此感到不安,尽管它似乎具有简化我站点上链接的有用应用程序。 编辑 在使用基本标签几周后,我确实发现使用基本标签会引起一些重大麻烦,这使它比最初出现的效果要差得多。从本质上讲,变化和基本标签下是_非常_有它们的默认行为不兼容,并可以

  • 问题内容: 我在我的项目中使用。现在已弃用,我想知道什么是替代品?如何获取语音通话事件? 问题答案: 这没有很好的文档记录,但是我在公共头文件中找到了此提及: “替换为” 因此,从iOS 10开始,您应该使用新框架类来检索有关活动调用的信息: 提供您要在其上执行委托回调的符合协议和队列的对象: 并在委托对象上实现以下方法: 有关更多信息,您可以检查: 通过WWDC 2016的CallKit会话增强