Lucene：多词短语作为搜索词

呼延庆

2023-03-14

问题内容：

我正在尝试使用Apache Lucene创建可搜索的电话/本地业务目录。

我有街道名称，公司名称，电话号码等字段。我遇到的问题是，当我尝试按街道名称中包含多个单词（例如“新月”）的街道进行搜索时，没有返回结果。但是，如果我尝试仅用一个词（例如“新月”）进行搜索，那么我会得到所有想要的结果。

我正在使用以下索引数据：

String LocationOfDirectory = "C:\\dir\\index";

StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_34);
Directory Index = new SimpleFSDirectory(LocationOfDirectory);

IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE.34, analyzer);
IndexWriter w = new IndexWriter(index, config);


Document doc = new Document();
doc.add(new Field("Street", "the crescent", Field.Store.YES, Field.Index.Analyzed);

w.add(doc);
w.close();

我的搜索是这样的：

int numberOfHits = 200;
String LocationOfDirectory = "C:\\dir\\index";
TopScoreDocCollector collector = TopScoreDocCollector.create(numberOfHits, true);
Directory directory = new SimpleFSDirectory(new File(LocationOfDirectory));
IndexSearcher searcher = new IndexSearcher(IndexReader.open(directory);

WildcardQuery q = new WildcardQuery(new Term("Street", "the crescent");

searcher.search(q, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;

我尝试过将通配符查询替换为短语查询，首先将整个字符串替换为字符串，然后在空白处将其拆分，然后将其包装在BooleanQuery中，如下所示：

String term = "the crescent";
BooleanQuery b = new BooleanQuery();
PhraseQuery p = new PhraseQuery();
String[] tokens = term.split(" ");
for(int i = 0 ; i < tokens.length ; ++i)
{
    p.add(new Term("Street", tokens[i]));
}
b.add(p, BooleanClause.Occur.MUST);

但是，这没有用。我尝试使用KeywordAnalyzer代替StandardAnalyzer，但是所有其他类型的搜索也停止了工作。我尝试用其他字符（+和@）替换空格，并在这种形式之间来回查询，但这仍然行不通。我认为这是行不通的，因为+和@是没有索引的特殊字符，但是我似乎找不到任何这样的字符的列表。

我开始发疯了，有人知道我做错了吗？

问题答案：

我发现尝试不使用QueryParser生成查询的尝试不起作用，因此我停止尝试创建自己的查询，而改用QueryParser。我在网上看到的所有建议都表明，应在建立索引期间在QueryParser中使用相同的分析器，因此我使用StandardAnalyzer来构建QueryParser。

这对本示例有效，因为在索引过程中StandardAnalyzer从街道“新月”中删除了“ the”一词，因此我们无法搜索它，因为它不在索引中。

但是，如果选择搜索“ Grove Road”，则开箱即用功能会出现问题，即查询将返回包含“ Grove”或“
Road”的所有结果。通过设置QueryParser可以很容易地解决此问题，使其默认操作为AND而不是OR。

最后，正确的解决方案是：

int numberOfHits = 200;
String LocationOfDirectory = "C:\\dir\\index";
TopScoreDocCollector collector = TopScoreDocCollector.create(numberOfHits, true);
Directory directory = new SimpleFSDirectory(new File(LocationOfDirectory));
IndexSearcher searcher = new IndexSearcher(IndexReader.open(directory);

StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);

//WildcardQuery q = new WildcardQuery(new Term("Street", "the crescent");
QueryParser qp = new QueryParser(Version.LUCENE_35, "Street", analyzer);
qp.setDefaultOperator(QueryParser.Operator.AND);

Query q = qp.parse("grove road");

searcher.search(q, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;

Lucene：多词短语作为搜索词

相关阅读

相关文章

相关问答

相关工具

相关文档