当前位置: 首页 > 知识库问答 >
问题:

现有字段上ElasticSearch中的补全暗示

司徒良哲
2023-03-14

在我的elasticsearch索引中,我索引了一堆工作。为了简单起见,我们只能说它们是一堆职位头衔。当人们在我的搜索引擎中输入一个职位头衔时,我想用可能的匹配“自动完成”。

我在这里调查了完成建议者:http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-suggesters-completion.html

"JobTitle": {
    "type": "string",
    "fields": {
        "Original": {
            "type": "string",
            "index": "not_analyzed"
        }
    }
}

如果没有,是否可以进行非空白标记化/n-gram搜索来获取这些字段?虽然会慢一些,但我认为这会起作用。

共有1个答案

唐恺
2023-03-14

好的,这里有一个简单的方法(可以缩放,也可以不缩放),使用前缀查询。

我将使用您提到的“fields”技术和我在这里找到的一些方便的工作描述数据创建索引:

DELETE /test_index

PUT /test_index
{
   "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 0
   },
   "mappings": {
      "doc": {
         "properties": {
            "title": {
               "type": "string",
               "fields": {
                  "raw": {
                     "type": "string",
                     "index": "not_analyzed"
                  }
               }
            }
         }
      }
   }
}

PUT /test_index/_bulk
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"experienced bra fitter", "desc":"I bet they had trouble finding candidates for this one."}
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"PlayStation Brand Ambassador", "desc":"please report to your residence in the United States of Nintendo."}
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"Eyebrow Threading", "desc":"I REALLY hope this has something to do with dolls."}
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"Administraive/ Secretary", "desc":"ok, ok, we get it. It’s clear where you need help."}
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"Finish Carpenter", "desc":"for when the Start Carpenter gets tired."}
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"Helpdesk Technician @ Pentagon", "desc":"“Uh, hello? I’m having a problem with this missile…”"}
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"Nail Tech", "desc":"so nails can be pretty complicated…"}
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"Remedy Engineer", "desc":"aren’t those called “doctors”?"}
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"Saltlick Cashier", "desc":"new trend in the equestrian industry. Ok, enough horsing around."}
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"Molecular Biologist II", "desc":"when Molecular Biologist I gets promoted."}
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"Breakfast Sandwich Maker", "desc":"we also got one of these recently."}
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"Hotel Housekeepers", "desc":"why can’t they just say ‘hotelkeepers’?"}
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"Preschool Teacher #4065", "desc":"either that’s a really big school or they’ve got robot teachers."}
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"glacéau drop team", "desc":"for a new sport at the Winter Olympics: ice-water spilling."}
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"PLUMMER/ELECTRICIAN", "desc":"get a dictionary/thesaurus first."}
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"DoodyCalls Technician", "desc":"they really shouldn’t put down janitors like that."}
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"Golf Staff", "desc":"and here I thought they were called clubs."}
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"Pressure Washers", "desc":"what’s next, heat cleaners?"}
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"Sandwich Artist", "desc":"another “Jesus in my food” wannabe."}
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"Self Storage Manager", "desc":"this is for self storage?"}
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"Qualified Infant Caregiver", "desc":"too bad for all the unqualified caregivers on the list."}
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"Ground Support", "desc":"but there’s just more dirt under there."}
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"Gymboree Teacher", "desc":"the hardest part is not burning your hands sliding down the pole."}
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"COMMERCIAL space hunter", "desc":"so they did find animals further out in the cosmos? Who knew."}
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"JOB COACH", "desc":"if they’re unemployed when they get to you, what does that say about them?"}
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"KIDS KAMP INSTRUCTOR!", "desc":"no spelling ability required."}
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"POOLS SUPERVISOR", "desc":"“yeah, they’re still wet…”"}
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"HOUSE MANAGER/TEEN SUPERVISOR", "desc":"see the dictionary under P, for Parent."}
{"index":{"_index":"test_index","_type":"doc"}}
{"title":"Licensed Seamless Gutter Contractor", "desc":"just sounds bad."}

然后我可以轻松地运行前缀查询:

POST /test_index/_search
{
    "query": {
        "prefix": {
           "title": {
              "value": "san"
           }
        }
    }
}
...
{
   "took": 6,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 1,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "mcRfqtwzTyWE7ZNsKFvwEg",
            "_score": 1,
            "_source": {
               "title": "Breakfast Sandwich Maker",
               "desc": "we also got one of these recently."
            }
         },
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "fIYV0WOWRe6gfpYy_u2jlg",
            "_score": 1,
            "_source": {
               "title": "Sandwich Artist",
               "desc": "another “Jesus in my food” wannabe."
            }
         }
      ]
   }
}
POST /test_index/_search
{
    "query": {
        "prefix": {
           "title.raw": {
              "value": "San"
           }
        }
    }
}
...
{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "fIYV0WOWRe6gfpYy_u2jlg",
            "_score": 1,
            "_source": {
               "title": "Sandwich Artist",
               "desc": "another “Jesus in my food” wannabe."
            }
         }
      ]
   }
}

下面是我使用的代码:

http://sense.qbox.io/gist/4e066d051d7dab5fe819264b0f4b26d958d115a9

编辑:Ngram版本

DELETE /test_index

PUT /test_index
{
   "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 0,
      "analysis": {
         "filter": {
            "nGram_filter": {
               "type": "nGram",
               "min_gram": 2,
               "max_gram": 20,
               "token_chars": [
                  "letter",
                  "digit",
                  "punctuation",
                  "symbol"
               ]
            }
         },
         "analyzer": {
            "nGram_analyzer": {
               "type": "custom",
               "tokenizer": "whitespace",
               "filter": [
                  "lowercase",
                  "asciifolding",
                  "nGram_filter"
               ]
            },
            "whitespace_analyzer": {
               "type": "custom",
               "tokenizer": "whitespace",
               "filter": [
                  "lowercase",
                  "asciifolding"
               ]
            }
         }
      }
   },
   "mappings": {
      "doc": {
         "properties": {
            "title": {
               "type": "string",
               "index_analyzer": "nGram_analyzer", 
               "search_analyzer": "whitespace_analyzer", 
               "fields": {
                  "raw": {
                     "type": "string",
                     "index": "not_analyzed"
                  }
               }
            }
         }
      }
   }
}
POST /test_index/_search
{
    "query": {
        "match": {
           "title": "sup"
        }
    }
}
...
{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": 1.8631258,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "4pcAOmPNSYupjz7lSes8jw",
            "_score": 1.8631258,
            "_source": {
               "title": "Ground Support",
               "desc": "but there’s just more dirt under there."
            }
         },
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "DVFOC6DsTa6eH_a-RtbUUw",
            "_score": 1.8631258,
            "_source": {
               "title": "POOLS SUPERVISOR",
               "desc": "“yeah, they’re still wet…”"
            }
         },
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "klleY_bnQ4uFmCPF94sLOw",
            "_score": 1.4905007,
            "_source": {
               "title": "HOUSE MANAGER/TEEN SUPERVISOR",
               "desc": "see the dictionary under P, for Parent."
            }
         }
      ]
   }
}

http://sense.qbox.io/gist/b0e77bb7f05a4527de5ab4345749c793f923794c

 类似资料:
  • ElasticSearch5.x对Suggester API(文档)进行了一些(突破性的)更改。最显著的变化如下: 完成建议器是面向文档的 和按本查询: 它产生以下结果: 简而言之,对于文本“joh”的补全建议,返回了两(2)个文档-John的文档和属性的值都相同。 null 为了克服第二点,仅仅索引单个单词是不够的,因为您还需要将所有单词映射到文档,以便适当地缩小自动完成的后续单词。这样,您实际

  • 问题内容: 无论如何,我可以在现有的Elasticsearch映射中重命名元素而不必添加新元素?如果是这样,为了避免破坏现有映射,最好的方法是什么? 例如从fieldCamelcase到fieldCamelCase 问题答案: 您可以通过创建一个Ingest管道来做到这一点,该管道包含一个Rename Processor 和Reindex API 。 请注意,您需要运行Elasticsearch

  • 问题内容: 我想在字段上使用stats或extended_stats聚合,但是找不到完成此操作的任何示例(即,似乎只能将聚合与实际文档字段一起使用)。 是否有可能计算出“元数据”在ElasticSearch查询响应每个命中字段请求集合(例如,,,,等等)? 我假设答案是“否”,因为未对类似字段进行索引… 问题答案: 注意:就最新版本的Elasticsearch而言,原始答案现在已过时。使用Groo

  • 问题内容: 我是ElasticSearch的新手,所以这可能很琐碎,但是我还没有找到更好的方法来获取所有内容,使用脚本处理并逐个更新寄存器。 我想做一个简单的SQL更新: 我的意图是将实际的伪造数据替换为更有意义的数据(因此,表达式基本上是从有效值池中随机选择)。 问题答案: 关于使通过查询更新文档成为可能,存在一些未 解决的问题。 技术挑战是,lucene(elasticsearch在后台使用的

  • 我在elasticsearch中有一个文档索引,每个文档有480个字段。我试图做的是搜索一个词(例如“Apple”),并获得所有其值与搜索词匹配的唯一字段名。所以如果我的文档是: 作为查询的结果,我希望得到如下所示的聚合: 由于每个文档都有480个字段,所以我更喜欢执行multi_match查询,而不是使用包含所有字段的筛选器: 这个查询在ElasticSearch中可能吗?

  • 我在elasticsearch(版本:5.1.1)中有一个现有的索引,其中有一些文档索引。索引中的映射(例如硬件)有一个A字段,如下所示: 我想用analyzer向其添加一个字段,如下所示:“BiosSerialNumber”:{“Type”:“Keyword”,“Fields”:{“Suffix”:{“Type”:“Text”,“analyzer”:“ABC_Analyzer”}}} “abc_