当前位置: 首页 > 知识库问答 >
问题:

弹性搜索MoreLikeThis查询从不返回结果

薛霄
2023-03-14

创建新的。NET Framework 4.6.1控制台应用程序

为NEST 6.5.0和ElasticSearch.NET 6.5.0添加NuGet包

然后,我创建了一个新的弹性索引,其中包含具有“tags”属性的对象(类型为“mything”)。此标记是一组可能值中的随机逗号分隔的单词集。在测试中,我在索引中插入了100到5000项。我试了越来越少的可能的词在设置。

    var result = EsClient.Search<MyThing>(s => s
        .Index(DEFAULT_INDEX)
        .Query(esQuery =>
        {
            var mainQuery = esQuery
                .MoreLikeThis(mlt => mlt
                    .Include(true)
                    .Fields(f => f.Field(ff => ff.Tags, 5))
                    .Like(l => l.Document(d => d.Id(id)))
                );

            return mainQuery;
        }
using Nest;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace Test_MoreLikeThis_ES6
{
    class Program
    {
        public class MyThing
        {
            public string Tags { get; set; }
        }

        const string ELASTIC_SERVER = "http://localhost:9200";
        const string DEFAULT_INDEX = "my_index";
        const int NUM_RECORDS = 1000;

        private static Uri es_node = new Uri(ELASTIC_SERVER);
        private static ConnectionSettings settings = new ConnectionSettings(es_node).DefaultIndex(DEFAULT_INDEX);
        private static ElasticClient EsClient = new ElasticClient(settings);

        private static Random rnd = new Random();

        static void Main(string[] args)
        {
            Console.WriteLine("Rebuild index? (y):");
            var answer = Console.ReadLine().ToLower();
            if (answer == "y")
            {
                RebuildIndex();
                for (int i = 0; i < NUM_RECORDS; i++)
                {
                    AddToIndex();
                }
            }

            Console.WriteLine("");
            Console.WriteLine("Getting a Thing...");
            var aThingId = GetARandomThingId();


            Console.WriteLine("");
            Console.WriteLine("Looking for something similar to document with id " + aThingId);
            Console.WriteLine("");
            Console.WriteLine("");

            GetMoreLikeAThing(aThingId);
        }

        private static string GetARandomThingId()
        {
            var firstdocQuery = EsClient
                .Search<MyThing>(s =>
                    s.Size(1)
                    .Query(q => {
                        return q.FunctionScore(fs => fs.Functions(fn => fn.RandomScore(rs => rs.Seed(DateTime.Now.Ticks).Field("_seq_no"))));
                    })
                );

            if (!firstdocQuery.IsValid || firstdocQuery.Hits.Count == 0) return null;

            var hit = firstdocQuery.Hits.First();
            Console.WriteLine("Found a thing with id '" + hit.Id + "' and tags: " + hit.Source.Tags);
            return hit.Id;
        }

        private static void GetMoreLikeAThing(string id)
        {

            var result = EsClient.Search<MyThing>(s => s
                .Index(DEFAULT_INDEX)
                .Query(esQuery =>
                {
                    var mainQuery = esQuery
                        .MoreLikeThis(mlt => mlt
                            .Include(true)
                            .Fields(f => f.Field(ff => ff.Tags, 5))
                            .Like(l => l.Document(d => d.Id(id)))
                        );

                    return mainQuery;
                }

            ));

            if (result.IsValid)
            {
                if (result.Hits.Count > 0)
                {
                    Console.WriteLine("These things are similar:");
                    foreach (var hit in result.Hits)
                    {
                        Console.WriteLine("   " + hit.Id + " : " + hit.Source.Tags);
                    }
                }
                else
                {
                    Console.WriteLine("No similar things found.");
                }

            }
            else
            {
                Console.WriteLine("There was an error running the ES query.");
            }

            Console.WriteLine("");
            Console.WriteLine("Enter (y) to get another thing, or anything else to exit");
            var y = Console.ReadLine().ToLower();

            if (y == "y")
            {
                var aThingId = GetARandomThingId();
                GetMoreLikeAThing(aThingId);
            }

            Console.WriteLine("");
            Console.WriteLine("Any key to exit...");
            Console.ReadKey();

        }

        private static void RebuildIndex()
        {
            var existsResponse = EsClient.IndexExists(DEFAULT_INDEX);
            if (existsResponse.Exists) //delete existing mapping (and data)
            {
                EsClient.DeleteIndex(DEFAULT_INDEX);
            }

            var rebuildResponse = EsClient.CreateIndex(DEFAULT_INDEX, c => c.Settings(s => s.NumberOfReplicas(1).NumberOfShards(5)));
            var response2 = EsClient.Map<MyThing>(m => m.AutoMap());
        }

        private static void AddToIndex()
        {
            var myThing = new MyThing();
            var tags = new List<string> {
                    "catfish",
                    "tractor",
                    "racecar",
                    "airplane",
                    "chicken",
                    "goat",
                    "pig",
                    "horse",
                    "goose",
                    "duck"
                };

            var randNum = rnd.Next(0, tags.Count);

            //get randNum random tags
            var rand = tags.OrderBy(o => Guid.NewGuid().ToString()).Take(randNum);
            myThing.Tags = string.Join(", ", rand);

            var ir = new IndexRequest<MyThing>(myThing);
            var indexResponse = EsClient.Index(ir);

            Console.WriteLine("Index response: " + indexResponse.Id + " : " + string.Join(" " , myThing.Tags));
        }
    }
}

共有1个答案

鲍俊杰
2023-03-14

这里的问题是,对于原型文档的任何术语,默认值min_term_freq永远不会满足2,因为所有文档只包含每个标记(术语)一次。如果将min_term_freq降为1,就会得到结果。还可能需要将min_doc_freq设置为1,并与排除原型文档的查询组合。

这里有一个可以玩的例子

const string ELASTIC_SERVER = "http://localhost:9200";
const string DEFAULT_INDEX = "my_index";
const int NUM_RECORDS = 1000;

private static readonly Random _random = new Random();
private static readonly IReadOnlyList<string> Tags = 
    new List<string>
    {
        "catfish",
        "tractor",
        "racecar",
        "airplane",
        "chicken",
        "goat",
        "pig",
        "horse",
        "goose",
        "duck"
    };

private static ElasticClient _client;

private static void Main()
{
    var pool = new SingleNodeConnectionPool(new Uri(ELASTIC_SERVER));

    var settings = new ConnectionSettings(pool)
        .DefaultIndex(DEFAULT_INDEX);

    _client = new ElasticClient(settings);

    Console.WriteLine("Rebuild index? (y):");
    var answer = Console.ReadLine().ToLower();
    if (answer == "y")
    {
        RebuildIndex();
        AddToIndex();
    }

    Console.WriteLine();
    Console.WriteLine("Getting a Thing...");
    var aThingId = GetARandomThingId();

    Console.WriteLine();
    Console.WriteLine("Looking for something similar to document with id " + aThingId);
    Console.WriteLine();
    Console.WriteLine();

    GetMoreLikeAThing(aThingId);
}

public class MyThing
{
    public List<string> Tags { get; set; }
}

private static string GetARandomThingId()
{
    var firstdocQuery = _client
        .Search<MyThing>(s =>
            s.Size(1)
            .Query(q => q
                .FunctionScore(fs => fs
                    .Functions(fn => fn
                        .RandomScore(rs => rs
                            .Seed(DateTime.Now.Ticks)
                            .Field("_seq_no")
                        )
                    )
                )
            )
        );

    if (!firstdocQuery.IsValid || firstdocQuery.Hits.Count == 0) return null;

    var hit = firstdocQuery.Hits.First();
    Console.WriteLine($"Found a thing with id '{hit.Id}' and tags: {string.Join(", ", hit.Source.Tags)}");
    return hit.Id;
}

private static void GetMoreLikeAThing(string id)
{
    var result = _client.Search<MyThing>(s => s
        .Index(DEFAULT_INDEX)
        .Query(esQuery => esQuery 
            .MoreLikeThis(mlt => mlt
                    .Include(true)
                    .Fields(f => f.Field(ff => ff.Tags))
                    .Like(l => l.Document(d => d.Id(id)))
                    .MinTermFrequency(1)
                    .MinDocumentFrequency(1)
            ) && !esQuery
            .Ids(ids => ids
                .Values(id)
            )
        )
    );

    if (result.IsValid)
    {
        if (result.Hits.Count > 0)
        {
            Console.WriteLine("These things are similar:");
            foreach (var hit in result.Hits)
            {
                Console.WriteLine($"   {hit.Id}: {string.Join(", ", hit.Source.Tags)}");
            }
        }
        else
        {
            Console.WriteLine("No similar things found.");
        }

    }
    else
    {
        Console.WriteLine("There was an error running the ES query.");
    }

    Console.WriteLine();
    Console.WriteLine("Enter (y) to get another thing, or anything else to exit");
    var y = Console.ReadLine().ToLower();

    if (y == "y")
    {
        var aThingId = GetARandomThingId();
        GetMoreLikeAThing(aThingId);
    }

    Console.WriteLine();
    Console.WriteLine("Any key to exit...");

}

private static void RebuildIndex()
{
    var existsResponse = _client.IndexExists(DEFAULT_INDEX);
    if (existsResponse.Exists) //delete existing mapping (and data)
    {
        _client.DeleteIndex(DEFAULT_INDEX);
    }

    var rebuildResponse = _client.CreateIndex(DEFAULT_INDEX, c => c
        .Settings(s => s
            .NumberOfShards(1)
        )
        .Mappings(m => m       
            .Map<MyThing>(mm => mm.AutoMap())
        )
    );
}

private static void AddToIndex()
{
    var bulkAllObservable = _client.BulkAll(GetMyThings(), b => b
        .RefreshOnCompleted()
        .Size(1000));

    var waitHandle = new ManualResetEvent(false);
    Exception exception = null;

    var bulkAllObserver = new BulkAllObserver(
        onNext: r =>
        {
            Console.WriteLine($"Indexed page {r.Page}");
        },
        onError: e => 
        {
            exception = e;
            waitHandle.Set();
        },
        onCompleted: () => waitHandle.Set());

    bulkAllObservable.Subscribe(bulkAllObserver);

    waitHandle.WaitOne();

    if (exception != null)
    {
        throw exception;
    }
}

private static IEnumerable<MyThing> GetMyThings()
{
    for (int i = 0; i < NUM_RECORDS; i++)
    {
        var randomTags = Tags.OrderBy(o => Guid.NewGuid().ToString())
            .Take(_random.Next(0, Tags.Count))
            .OrderBy(t => t)
            .ToList();

        yield return new MyThing { Tags = randomTags };
    }
}

下面是一个输出示例

Found a thing with id 'Ugg9LGkBPK3n91HQD1d5' and tags: airplane, goat
These things are similar:
   4wg9LGkBPK3n91HQD1l5: airplane, goat
   9Ag9LGkBPK3n91HQD1l5: airplane, goat
   Vgg9LGkBPK3n91HQD1d5: airplane, goat, goose
   sQg9LGkBPK3n91HQD1d5: airplane, duck, goat
   lQg9LGkBPK3n91HQD1h5: airplane, catfish, goat
   9gg9LGkBPK3n91HQD1l5: airplane, catfish, goat
   FQg9LGkBPK3n91HQD1p5: airplane, goat, goose
   Jwg9LGkBPK3n91HQD1p5: airplane, goat, goose
   Fwg9LGkBPK3n91HQD1d5: airplane, duck, goat, tractor
   Kwg9LGkBPK3n91HQD1d5: airplane, goat, goose, horse
 类似资料:
  • 我正在LDAP服务器上工作。它有弹性搜索。我必须用一些Javascript代码(JSON格式)发送查询。 这是我的查询: 我试图打印所有结果,其中“server”=“server\u name”(该字段是server:server\u name…)。我认为关于弹性搜索的文档太小了。我找到了一些文档,但都是一样的,对新用户没有帮助。这个例子太简单了。 此查询返回所有结果,包括任何筛选器。 Ps:这就

  • 在我的弹性搜索数据库中插入6条记录后,我正在尝试一个弹性搜索示例。

  • 我有以下格式的弹性搜索文档 } } 我的要求是,当我搜索特定字符串(string.string)时,我只想获得该字符串的FileOffSet(string.FileOffSet)。我该怎么做? 谢谢

  • 试图找出这个微不足道的例子的分数。我希望得到brenda eaton的文件,但我得到的是brenda fassie的最佳结果。

  • 以下是我的疑问.... 我没有结果。 另外,我正在使用这个插件来生成请求正文。 我的查询如下所示.. null 感谢您到目前为止的阅读,如果有人能帮助我找出如何使这一工作,我将非常感谢。

  • 我有一个问题,我想把弹性搜索的结果减少到1000个,不管有多少匹配的结果匹配,但这不应该影响排名和得分。 我在尝试,但这似乎只是告诉弹性搜索只获得前N个结果,而不考虑分数。如果我说错了,请纠正我。 有什么帮助吗? 编辑: 我已经在使用分页。因此,在From/Size中使用Size只会影响当前页面的大小。但我想将总结果的大小限制为1000,然后对其进行分页。