当前位置: 首页 > 知识库问答 >
问题:

弹性搜索中的范围与项提升

咸臻
2023-03-14

在弹性搜索中,我正在努力使助推按我所希望的方式工作。

假设我有一些包含性别、兴趣和年龄的索引配置文件,假设我发现性别匹配是最相关的,那么兴趣和最不重要的标准是用户的年龄。我原以为下面的查询会根据刚才提到的原则导致匹配配置文件的排序,但是当我执行它时,我首先得到一些男性,然后我得到50岁的女性安娜,然后是喜欢汽车的女性玛丽亚。.为什么Maria的分数没有比Anna高??

{
    "query": {
        "bool" : {
            "should" : [
                { "term"  : { "gender" : { "term": "male", "boost": 10.0 } } },
                { "term"  : { "likes"  : { "term": "cars", "boost" : 5.0 } } },
                { "range" : { "age"    : { "from" : 50,    "boost" : 1.0 } } }
            ],
            "minimum_number_should_match" : 1
        }
    }    
}

我们将不胜感激,

斯汀

以下是执行的curl命令:

$ curl -XPUT http://localhost:9200/users/profile/1 -d '{
    "nickname" : "bob",
    "gender" : "male",
    "age" : 48,
    "likes" : "airplanes"
}'

$ curl -XPUT http://localhost:9200/users/profile/2 -d '{
    "nickname" : "carlos",
    "gender" : "male",
    "age" : 24,
    "likes" : "food"
}'

$ curl -XPUT http://localhost:9200/users/profile/3 -d '{
    "nickname" : "julio",
    "gender" : "male",
    "age" : 18,
    "likes" : "ladies"
}'

$ curl -XPUT http://localhost:9200/users/profile/4 -d '{
    "nickname" : "maria",
    "gender" : "female",
    "age" : 25,
    "likes" : "cars"
}'

$ curl -XPUT http://localhost:9200/users/profile/5 -d '{
    "nickname" : "anna",
    "gender" : "female",
    "age" : 50,
    "likes" : "clothes"
}'

$ curl -XGET http://localhost:9200/users/profile/_search -d '{
    "query": {
        "bool" : {
            "should" : [
                { "term" : { "gender" : { "term": "male", "boost": 10.0 } } },
                { "term" : { "likes" : { "term": "cars", "boost" : 5.0 } } },
                { "range" : { "age" : { "from" : 50, "boost" : 1.0 } } }
            ],
            "minimum_number_should_match" : 1
        }
    }    
}'

共有1个答案

姜天宇
2023-03-14

增强值不是绝对的——它与其他因素相结合,以确定每个术语的相关性。

你有两种“性别”(我猜),但有很多不同的“喜好”。因此male几乎被认为是无关紧要的,因为它在您的数据中频繁出现。然而,cars可能只出现几次,因此被认为更相关。

逻辑对于全文搜索非常有用,但对于枚举则不太有用,因为枚举基本上用作过滤器。

幸运的是,您可以使用omit_term_freq_and_positionsomit_norms在每个字段的基础上禁用此功能。

尝试设置映射如下:

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1'  -d '
{
   "mappings" : {
      "test" : {
         "properties" : {
            "likes" : {
               "index" : "not_analyzed",
               "omit_term_freq_and_positions" : 1,
               "omit_norms" : 1,
               "type" : "string"
            },
            "gender" : {
               "index" : "not_analyzed",
               "omit_term_freq_and_positions" : 1,
               "omit_norms" : 1,
               "type" : "string"
            },
            "age" : {
               "type" : "integer"
            }
         }
      }
   }
}
'

更新:完整工作示例:

删除现有索引:

curl -XDELETE 'http://127.0.0.1:9200/users/?pretty=1'

使用新映射创建索引:

curl -XPUT 'http://127.0.0.1:9200/users/?pretty=1'  -d '
{
   "mappings" : {
      "profile" : {
         "properties" : {
            "likes" : {
               "index" : "not_analyzed",
               "omit_term_freq_and_positions" : 1,
               "type" : "string",
               "omit_norms" : 1
            },
            "age" : {
               "type" : "integer"
            },
            "gender" : {
               "index" : "not_analyzed",
               "omit_term_freq_and_positions" : 1,
               "type" : "string",
               "omit_norms" : 1
            }
         }
      }
   }
}
'

为测试文档编制索引:

curl -XPOST 'http://127.0.0.1:9200/users/profile/_bulk?pretty=1'  -d '
{"index" : {"_id" : 1}}
{"nickname" : "bob", "likes" : "airplanes", "age" : 48, "gender" : "male"}
{"index" : {"_id" : 2}}
{"nickname" : "carlos", "likes" : "food", "age" : 24, "gender" : "male"}
{"index" : {"_id" : 3}}
{"nickname" : "julio", "likes" : "ladies", "age" : 18, "gender" : "male"}
{"index" : {"_id" : 4}}
{"nickname" : "maria", "likes" : "cars", "age" : 25, "gender" : "female"}
{"index" : {"_id" : 5}}
{"nickname" : "anna", "likes" : "clothes", "age" : 50, "gender" : "female"}
'

刷新索引(确保搜索时可以看到最新的文档):

curl -XPOST 'http://127.0.0.1:9200/users/_refresh?pretty=1' 

搜索:

curl -XGET 'http://127.0.0.1:9200/users/profile/_search?pretty=1'  -d '
{
   "query" : {
      "bool" : {
         "minimum_number_should_match" : 1,
         "should" : [
            {
               "term" : {
                  "gender" : {
                     "boost" : 10,
                     "term" : "male"
                  }
               }
            },
            {
               "term" : {
                  "likes" : {
                     "boost" : 5,
                     "term" : "cars"
                  }
               }
            },
            {
               "range" : {
                  "age" : {
                     "boost" : 1,
                     "from" : 50
                  }
               }
            }
         ]
      }
   }
}
'

结果:

# {
#    "hits" : {
#       "hits" : [
#          {
#             "_source" : {
#                "nickname" : "bob",
#                "likes" : "airplanes",
#                "age" : 48,
#                "gender" : "male"
#             },
#             "_score" : 0.053500723,
#             "_index" : "users",
#             "_id" : "1",
#             "_type" : "profile"
#          },
#          {
#             "_source" : {
#                "nickname" : "carlos",
#                "likes" : "food",
#                "age" : 24,
#                "gender" : "male"
#             },
#             "_score" : 0.053500723,
#             "_index" : "users",
#             "_id" : "2",
#             "_type" : "profile"
#          },
#          {
#             "_source" : {
#                "nickname" : "julio",
#                "likes" : "ladies",
#                "age" : 18,
#                "gender" : "male"
#             },
#             "_score" : 0.053500723,
#             "_index" : "users",
#             "_id" : "3",
#             "_type" : "profile"
#          },
#          {
#             "_source" : {
#                "nickname" : "anna",
#                "likes" : "clothes",
#                "age" : 50,
#                "gender" : "female"
#             },
#             "_score" : 0.029695695,
#             "_index" : "users",
#             "_id" : "5",
#             "_type" : "profile"
#          },
#          {
#             "_source" : {
#                "nickname" : "maria",
#                "likes" : "cars",
#                "age" : 25,
#                "gender" : "female"
#             },
#             "_score" : 0.015511602,
#             "_index" : "users",
#             "_id" : "4",
#             "_type" : "profile"
#          }
#       ],
#       "max_score" : 0.053500723,
#       "total" : 5
#    },
#    "timed_out" : false,
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 5,
#       "total" : 5
#    },
#    "took" : 4
# }

更新:替代办法

在这里,我提供了一个替代查询,它虽然更详细,但可以为您提供更可预测的结果。它涉及到使用自定义过滤器分数查询。首先,我们将文档筛选为至少符合其中一个条件的文档。因为我们使用常量分数查询,所以所有文档的初始分数都是1。

自定义筛选器分数允许我们在每个文档与筛选器匹配的情况下对其进行提升:

curl -XGET 'http://127.0.0.1:9200/_all/_search?pretty=1'  -d '
{
   "query" : {
      "custom_filters_score" : {
         "query" : {
            "constant_score" : {
               "filter" : {
                  "or" : [
                     {
                        "term" : {
                           "gender" : "male"
                        }
                     },
                     {
                        "term" : {
                           "likes" : "cars"
                        }
                     },
                     {
                        "range" : {
                           "age" : {
                              "gte" : 50
                           }
                        }
                     }
                  ]
               }
            }
         },
         "score_mode" : "total",
         "filters" : [
            {
               "boost" : "10",
               "filter" : {
                  "term" : {
                     "gender" : "male"
                  }
               }
            },
            {
               "boost" : "5",
               "filter" : {
                  "term" : {
                     "likes" : "cars"
                  }
               }
            },
            {
               "boost" : "1",
               "filter" : {
                  "range" : {
                     "age" : {
                        "gte" : 50
                     }
                  }
               }
            }
         ]
      }
   }
}
'

您将看到,与每个文档关联的分数都是很好的整数,很容易追溯到匹配的子句:

# [Fri Jun  8 21:30:24 2012] Response:
# {
#    "hits" : {
#       "hits" : [
#          {
#             "_source" : {
#                "nickname" : "bob",
#                "likes" : "airplanes",
#                "age" : 48,
#                "gender" : "male"
#             },
#             "_score" : 10,
#             "_index" : "users",
#             "_id" : "1",
#             "_type" : "profile"
#          },
#          {
#             "_source" : {
#                "nickname" : "carlos",
#                "likes" : "food",
#                "age" : 24,
#                "gender" : "male"
#             },
#             "_score" : 10,
#             "_index" : "users",
#             "_id" : "2",
#             "_type" : "profile"
#          },
#          {
#             "_source" : {
#                "nickname" : "julio",
#                "likes" : "ladies",
#                "age" : 18,
#                "gender" : "male"
#             },
#             "_score" : 10,
#             "_index" : "users",
#             "_id" : "3",
#             "_type" : "profile"
#          },
#          {
#             "_source" : {
#                "nickname" : "maria",
#                "likes" : "cars",
#                "age" : 25,
#                "gender" : "female"
#             },
#             "_score" : 5,
#             "_index" : "users",
#             "_id" : "4",
#             "_type" : "profile"
#          },
#          {
#             "_source" : {
#                "nickname" : "anna",
#                "likes" : "clothes",
#                "age" : 50,
#                "gender" : "female"
#             },
#             "_score" : 1,
#             "_index" : "users",
#             "_id" : "5",
#             "_type" : "profile"
#          }
#       ],
#       "max_score" : 10,
#       "total" : 5
#    },
#    "timed_out" : false,
#    "_shards" : {
#       "failed" : 0,
#       "successful" : 20,
#       "total" : 20
#    },
#    "took" : 6
# }
 类似资料:
  • 我在研究弹性搜索查询。我不能理解这个问题: 我读过这篇文章,但不清楚:http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-term-query.html 1-第二个“term”是什么? 2-boost的用法是什么? 3-如何使用一个或多个term进行查询: 名为“title”的字段必须包含:

  • 我使用Elasticsearch允许用户输入要搜索的术语。例如,我要搜索以下属性'name': 如果使用以下代码搜索或,我希望返回此文档。 我尝试过做一个bool must和做多个术语,但它似乎只有在整个字符串都匹配的情况下才起作用。 所以我真正想做的是,这个词是否以任何顺序包含两个词。 有人能帮我走上正轨吗?我已经在这上面砸了一段时间了。

  • 我正在尝试在Elastic 2中创建一个过滤索引别名。十、 以下是所有青少年的资料,不分性别。我只想在这个过滤器里看到雌性。 这就是我试图创建索引别名的原因: 我看了这个问题,似乎是相同的答案,但是我的JSON一定有问题。 Elasticsearch将范围和术语连接到相同的数组项

  • 我有大量相同类型的实体,每个实体都有大量属性,并且我只有以下两种选择来存储它们: 将每个项存储在索引中并执行多索引搜索 将所有enties存储在单个索引中,并且只搜索1个索引。 一般而言,我想要一个时间复杂度之间的比较搜索“N”实体与“M”特征在上述每一种情况!

  • 本文向大家介绍solr范围搜索,包括了solr范围搜索的使用技巧和注意事项,需要的朋友参考一下 示例 age:[50 TO 60] 匹配年龄在50和60之间(包括50和60)的文档 age:{50 TO 60} 匹配年龄在50到60之间(不包括50到60)的文档 age:[* TO 60] 匹配年龄小于或等于60的文档 age:[50 TO *] 匹配年龄大于或等于50的文档 age:{50 to

  • 我使用elasticsearch处理带有日期值的字符串字段的数据,如下所示: 我试图使用范围筛选器从日期到日期获取值。例如: 但是结果不包含的值 如果使用或,则结果中将包含的值。 我的查询有什么问题?