当前位置: 首页 > 工具软件 > Solarium > 使用案例 >

将Solarium与SOLR一起使用进行搜索-实施

舒俊雄
2023-12-01

This is the third article in a four-part series on using Solarium, in conjunction with Apache’s SOLR search implementation.

这是有关使用Solarium的四部分系列文章中的第三篇,结合Apache的SOLR搜索实现。

In the first part I introduced the key concepts and we installed and set up SOLR. In part two we installed and configured Solarium, a library which enables us to use PHP to “talk” to SOLR as if it were a native component.

第一部分中,我介绍了关键概念,然后我们安装并设置了SOLR。 在第二部分中,我们安装并配置了Solarium,该库使我们能够使用PHP与SOLR“对话”,就好像它是本机组件一样。

Now we’re finally ready to start building the search mechanism, which is the subject of this installment.

现在,我们终于可以开始构建搜索机制了,这是本期的主题。

Let’s look at how to implement a really simple search:

让我们看一下如何实现一个非常简单的搜索:

$query = $client->createSelect();
$query->setQuery(Input::get('q'));

Input::get('q') is simply Laravel’s way of grabbing a GET or POST variable named q which, you’ll remember, is the name of our search form element.

Input::get('q')只是Laravel捕获名为qGETPOST变量的方式,您会记住,这是我们搜索表单元素的名称。

Or better still, use a placeholder to escape the search phrase:

或更妙的是,使用占位符转义搜索短语:

$query->setQuery('%P1%', array(Input::get('q')));

A placeholder is indicated by the % symbols. The letter “P” means “escape this as a Phrase”. The bound variables are passed as an array, and the number indicates the position in the array of the argument you wish to bind; bearing in mind that (perhaps unusually) 1 indicates the first item.

占位符由%符号指示。 字母“ P”的意思是“将其转义为短语”。 绑定变量作为数组传递,数字表示要绑定的参数在数组中的位置; 请记住,(可能是异常)1表示第一项。

To run the search:

要运行搜索:

$resultset = $client->select($query);

You can now retrieve the number of results using the getNumFound() method, for example:

现在,您可以使用getNumFound()方法检索结果数,例如:

printf('Your search yielded %d results:', $resultset->getNumFound());

$resultset is an instance of Solarium\QueryType\Select\Result\Result, which implements the Iterator interface – so you can iterate through the results as follows:

$resultsetSolarium\QueryType\Select\Result\Result的实例,该实例实现Iterator接口–因此,您可以按以下方式遍历结果:

foreach ($resultset as $document) {
    . . .
}

Each result is an instance of Solarium\QueryType\Select\Result\Document, which provides two ways in which you can access the individual fields – either as public properties, e.g.:

每个结果都是Solarium\QueryType\Select\Result\Document的实例,它提供了两种访问各个字段的方式–作为公共属性,例如:

<h3><?php print $document->title ?></h3>

Or, you can iterate through the available fields:

或者,您可以遍历可用字段:

foreach($document AS $field => $value)
{
    // this converts multi-value fields to a comma-separated string
    if(is_array($value)) $value = implode(', ', $value);

    print '<strong>' . $field . '</strong>: ' . $value . '<br />';        
}

Note that multi-value fields – such as cast – will return an array; so in the example above, it will simply collapse these fields into a comma-separated list.

请注意,多值字段(例如cast )将返回一个数组; 因此,在上面的示例中,它将简单地将这些字段折叠成一个逗号分隔的列表。

Okay, so that’s an overview of how to do it – now let’s plug it into our example application.

好的,这就是操作方法的概述–现在将其插入示例应用程序中。

We’ll make the search respond to a GET request rather than POST because it’ll make it easier when we come to look at faceted search – although it’s actually very common for site searches to use GET.

我们将使搜索响应GET请求而不是POST因为它使我们在研究多面搜索时更加轻松-尽管对于站点搜索来说,使用GET实际上非常普遍。

So the index route on the home controller (our application only has one page, after all) becomes the following:

因此,主控制器上的索引路由(毕竟我们的应用程序只有一页)变为以下内容:

/**
 * Display the search form / run the search.
 */
public function getIndex()
{

    if (Input::has('q')) {

        // Create a search query
        $query = $this->client->createSelect();

        // Set the query string      
        $query->setQuery('%P1%', array(Input::get('q')));

        // Execute the query and return the result
        $resultset = $this->client->select($query);

        // Pass the resultset to the view and return.
        return View::make('home.index', array(
            'q' => Input::get('q'),
            'resultset' => $resultset,
        ));

    }

    // No query to execute, just return the search form.
    return View::make('home.index');

}

Now let’s modify the view – app/views/home/index.blade.php – so that it displays the search results, as well as a result count, by adding this below the search form:

现在,让我们修改视图– app/views/home/index.blade.php –通过在搜索表单下方添加以下内容来显示搜索结果以及结果计数:

@if (isset($resultset))    
<header>
    <p>Your search yielded <strong>{{ $resultset->getNumFound() }}</strong> results:</p>
    <hr />
</header>

@foreach ($resultset as $document)

    <h3>{{ $document->title }}</h3>

    <dl>
        <dt>Year</dt>
        <dd>{{ $document->year }}</dd>

        @if (is_array($document->cast))
        <dt>Cast</dt>
        <dd>{{ implode(', ', $document->cast) }}</dd>              
        @endif

        </dl>

        {{ $document->synopsis }}

@endforeach
@endif

Try running a few searches. Quite quickly, you might notice a major limitation. As an example, try searching for “Star Wars”, note the first few results and then do a search for “Mark Hamill”. No results – looks like the search is only taking into account the title attribute, but not the cast.

尝试运行一些搜索。 很快,您可能会注意到一个主要限制。 例如,尝试搜索“星球大战”,记下前几个结果,然后搜索“马克·哈米尔”。 没有结果-看起来搜索只考虑了title属性,而不考虑演员表。

To alter this behavior we need to use the DisMax component. DisMax is an abbreviation of Disjunction Max. Disjunction means it searches across multiple fields. Max means that if a query matches multiple fields, the maximum scores are added together.

若要更改此行为,我们需要使用DisMax组件。 DisMax是Disjunction Max的缩写。 析取表示它跨多个字段搜索。 “最高”表示如果查询匹配多个字段,则将最高分数相加。

To indicate that we wish to perform a DisMax query:

要表明我们希望执行DisMax查询:

$dismax = $query->getDisMax();

Then we can tell the search to look in multiple fields – separate them with a space:

然后,我们可以告诉搜索在多个字段中查找-用空格分隔它们:

$dismax->setQueryFields('title cast synopsis');

Now, if you try searching for “Mark Hamill” again, you’ll see that the search picks up the cast, as well as the title.

现在,如果您尝试再次搜索“ Mark Hamill”,您会发现搜索记录了演员表以及标题。

We can take our DisMax query one step further by attaching weights to fields. This allows you to prioritize certain fields over others – for example, you probably want title matches to give you a higher score than matching words in the synopsis. Take a look at the following line:

通过将权重附加到字段,我们可以使DisMax查询更进一步。 这使您可以将某些字段优先于其他字段-例如,您可能希望标题匹配为您提供比大纲中匹配单词更高的分数。 看一下以下行:

$dismax->setQueryFields('title^3 cast^2 synopsis^1');

This indicates that we wish matches on the cast field to be weighted much higher than the synopsis – by a magnitude of two – and the title field further still. For your own projects, you’ll probably want to play around and experiment with various queries to try and work out the optimum weightings, which are likely to be very specific to the application in question.

这表明我们希望演员表上的比赛加权比简介要高得多(幅度为2),而标题栏还要更远。 对于您自己的项目,您可能需要尝试并尝试各种查询以尝试得出最佳权重,这些权重可能非常特定于所讨论的应用程序。

So just to sum up, we can implement searching over multiple fields by modifying app/controllers/HomeController.php as follows:

综上所述,我们可以通过修改app/controllers/HomeController.php来实现对多个字段的搜索,如下所示:

// Set the query string      
  $query->setQuery('%P1%', array(Input::get('q')));

        // Create a DisMax query
  $dismax = $query->getDisMax();

  // Set the fields to query, and their relative weights
  $dismax->setQueryFields('title^3 cast^2 synopsis^1');

  // Execute the query and return the result
  $resultset = $this->client->select($query);

指定要返回的字段 (Specifying Which Fields to Return)

If you run the search, then for each resultset document iterate through the fields, you’ll see that by default every field we’ve added to the index gets returned. In addition, SOLR adds the _version_ field, and the score associated with the search result, along with the unique identifier.

如果运行搜索,则对于每个结果集文档在字段中进行迭代,您将默认看到我们添加到索引中的每个字段都将返回。 此外,SOLR还添加_version_字段以及与搜索结果关联的score以及唯一标识符。

The score is a numeric value which expresses the relevance of the result.

分数是一个数字值,表示结果的相关性。

If you wish to change this behavior, there are three methods you can use:

如果要更改此行为,可以使用三种方法:

$query->clearFields(); // return no fields

$query->addField('title');  // add 'title' to the list of fields returned

$query->addFields(array('title', 'cast')); // add several fields to the list of those returned

Note that you’ll probably need to use clearFields() in conjunction with addField() or addFields():

请注意,您可能需要将clearFields()addField()addFields()结合使用:

$query->clearFields()->addFields(array('title', 'cast'));

Just as in SQL, you can use an asterisk as a wildcard – meaning select all fields:

就像在SQL中一样,您可以使用星号作为通配符–意味着选择所有字段:

$query->clearFields()->addFields('*');

排序搜索结果 (Sorting Search Results)

By default, search results will be returned in descending order of score. In most cases this is probably what you want; “best matches” appear first.

默认情况下,搜索结果将按分数的降序返回。 在大多数情况下,这可能就是您想要的; “最佳匹配”首先出现。

However, you can change this behavior if you wish as follows:

但是,您可以按照以下方式更改此行为:

$query->addSort('title', 'asc');

The syntax will probably look familiar; it’s very similar to SQL.

语法可能看起来很熟悉; 它与SQL非常相似。

分页 (Pagination)

You can specify the start position – i.e., where to start listing results – and the number of rows to return. Think of it as being like SQL’s LIMIT clause. So for example, to take the first hundred results you’d do this:

您可以指定start位置(即,从何处开始列出结果)以及要返回的rows数。 可以将其视为类似于SQL的LIMIT子句。 因此,例如,要获取前一百个结果,您可以这样做:

$query->setStart(0);
$query->setRows(200);

Armed with the result of getNumFound() and these functions, it should be straightforward to implement pagination, but for brevity I’m not going to go over that here.

有了getNumFound()和这些函数的结果,实现分页应该很简单,但是为了简洁起见,我在这里不再赘述。

Faceted search essentially allows you to “drill down” through search results based on one or more criteria. It’s probably best illustrated by online stores, where you can refine a product search by things like category, format (e.g. paperbacks vs hardback vs digital books), whether it’s currently in stock or by price range.

从本质上讲,分面搜索使您可以基于一个或多个条件“深入”搜索结果。 网上商店可能最好地说明了这一点,您可以在其中按类别,格式(例如,平装本,精装本还是电子书)来优化产品搜索,而不管它是当前库存还是价格范围。

Let’s expand our movie search with a basic facet; we’ll allow people to narrow down their movie search by its MPGG rating (a certificate specifying the appropriate age-range for a movie, e.g. “R” or “PG-13”).

让我们从一个基本方面扩展电影搜索; 我们将允许人们通过其MPGG等级(该证书指定了电影的适当年龄范围,例如“ R”或“ PG-13”)来缩小电影搜索范围。

To create a facet based on a field, you do this:

要基于字段创建构面,请执行以下操作:

$facetSet = $query->getFacetSet();

$facetSet->createFacetField('rating')
    ->setField('rating');

Upon running the search, the result-set can now be broken down based on the value of the field – and you can also display a count for that particular value.

运行搜索后,现在可以根据该字段的值细分结果集-并且您还可以显示该特定值的计数。

$facet = $resultset->getFacetSet()->getFacet('rating');
foreach($facet as $value => $count) {    
    echo $value . ' [' . $count . ']<br/>';
}

This will give you something along these lines:

这将为您提供以下方面的帮助:

Unrated [193]
PG [26]
R [23]
PG-13 [16]
G [9]
NC-17 [0]

A facet doesn’t have to use single, distinct values. You can use ranges – for example, you might have price ranges in an e-commerce site. To illustrate facet ranges in our movie search, we’re going to allow people to narrow their search to movies from particular decade.

构面不必使用单个不同的值。 您可以使用范围-例如,您在电子商务站点中可能具有价格范围。 为了说明电影搜索中的方面范围,我们将允许人们将搜索范围缩小到特定十年的电影。

Here’s the code to create the facet:

这是创建构面的代码:

$facet = $facetSet->createFacetRange('years')
    ->setField('year')
    ->setStart(1900)
    ->setGap(10)
    ->setEnd(2020);

This indicates that we want to create a range-based facet on the year field. We need to specify the start value – the year 1900 – and the end; i.e., the end of the current decade. We also need to set the gap; in other words we want increments of ten – a decade. To display the counts in our search results, we could do something like this:

这表明,我们要创建的基于范围的小year场。 我们需要指定起始值-1900年-结束时间; 即当前十年的终结。 我们还需要设定差距; 换句话说,我们希望增加十到十年。 要在搜索结果中显示计数,我们可以执行以下操作:

$facet = $resultset->getFacetSet()->getFacet('years');
foreach($facet as $range => $count) {    
    if ($count) {
        printf('%d's (%d)<br />', $range, $count);
    }
}

This will result in something along these lines:

这将导致以下情况:

1970's (12)
1980's (6)
2000's (8)

Note that the facet will contain every possible value, so it’s important to check that the count is non-zero before displaying it.

请注意,构面将包含所有可能的值,因此在显示计数之前检查计数是否为非零很重要。

分面搜索:过滤 (Faceted Search: Filtering)

So far we’ve used facets on the search results page to show the counts, but that’s of limited use unless we can allow users to filter their searches on them.

到目前为止,我们已经在搜索结果页面上使用了构面来显示计数,但是除非使用允许用户对他们进行搜索过滤的方法,否则它的用途是有限的。

In the search callback, let’s first check whether the MPGG rating filter has been applied:

在搜索回调中,让我们首先检查是否已应用MPGG评级过滤器:

if (Input::has('rating')) {
    $query->createFilterQuery('rating')->setQuery(sprintf('rating:%s', Input::get('rating')));     
}

Actually, just as with the main search query, we can yet Solarium escape the search term rather than use sprintf:

实际上,与主要搜索查询一样,我们仍然可以让Solarium逃避搜索项,而不是使用sprintf

if (Input::has('rating')) { 
    $query->createFilterQuery('rating')->setQuery('rating:%T1%', array(Input::get('rating')));    
}

Remember, the 1 indicates that we wish to use the first element of the array of arguments – it’s not a zero-based array. The T indicates we wish to escape the value as a term (as opposed to P for phrase).

请记住,1表示我们希望使用参数数组的第一个元素–它不是从零开始的数组。 T表示我们希望将值转义为一项(相对于词组的P )。

Filtering on decade is slightly more complicated, because we’re filtering based on a range rather than a discreet value. We only have one value specified – in Input::get('decade') – but we know that the upper bound is simply the start of the decade plus nine. So, for example, “the ‘Eighties” is represented by the value 1980, and the range 1980 through (1980 + 9) = 1989.

按十年进行过滤稍微复杂一点,因为我们是根据范围而不是谨慎的值进行过滤。 我们在Input::get('decade')仅指定了一个值,但我们知道上限只是该十年的开始加上9。 因此,例如,“ Eighties”由值1980表示,范围从1980到(1980 + 9)= 1989。

A range query takes the following form:

范围查询采用以下形式:

field: [x TO y]

So it would be:

因此它将是:

year: [1980 TO 1989]

We can implement this as follows:

我们可以这样实现:

if (Input::has('decade')) {
    $query->createFilterQuery('years')->setQuery(sprintf('year:[%d TO %d]', Input::get('decade'), (Input::get('decade') + 9)));            
}

Alternatively we can use a helper instead. To get an instance of the helper class:

另外,我们也可以使用助手。 要获取helper类的实例:

$helper = $query->getHelper();

To use it:

要使用它:

if (Input::has('decade')) {     
    $query->createFilterQuery('years')->setQuery($helper->rangeQuery('year', Input::get('decade'), (Input::get('decade') + 9)));
}

Whilst this may seem fairly academic, it’s worth knowing how to create an instance of the Solarium helper because it’s very useful for other things, such as geospatial support.

虽然这似乎是相当学术性的,但值得了解如何创建Solarium帮助器的实例,因为它对其他事情(例如地理空间支持)非常有用。

多面搜索:视图 (Faceted Search: The View)

Now that we’ve covered how to set up faceted search, how to list the facets and how to run filters based on them, we can set up the corresponding view.

既然我们已经介绍了如何设置多面搜索,如何列出这些方面以及如何基于它们运行过滤器,我们可以设置相应的视图。

Open up app/views/home/index.blade.php and modify the search results section to include an additional column, which will contain our facets:

打开app/views/home/index.blade.php并修改“搜索结果”部分,使其包含一个附加列,其中将包含我们的方面:

@if (isset($resultset))    
<div class="results row" style="margin-top:1em;">
    <div class="col-sm-4 col-md-4 col-lg-3">
        <?php $facet = $resultset->getFacetSet()->getFacet('rating'); ?>
        <div class="panel panel-primary">
            <div class="panel-heading">
                <h3 class="panel-title">By MPGG Rating</h3>
            </div>
            <ul class="list-group">
                @foreach ($facet as $value => $count)
                    @if ($count)
                    <li class="list-group-item">
                        <a href="?{{ http_build_query(array_merge(Input::all(), array('rating' => $value))) }}">{{ $value }}</a>
                        <span class="badge">{{ $count }}</span>
                    </li>
                    @endif
                @endforeach
            </ul>    
        </div>

        <?php $facet = $resultset->getFacetSet()->getFacet('years'); ?>
        <div class="panel panel-primary">
            <div class="panel-heading">
                <h3 class="panel-title">By Decade</h3>
            </div>
            <ul class="list-group">
                @foreach ($facet as $value => $count)
                    @if ($count)
                    <li class="list-group-item">
                        <a href="?{{ http_build_query(array_merge(Input::all(), array('decade' => $value))) }}">{{ $value }}'s</a>
                        <span class="badge">{{ $count }}</span>
                    </li>
                    @endif
                @endforeach
            </ul>    
        </div>
    </div>
    <div class="col-sm-8 col-md-8 col-lg-9">

        <!-- SEARCH RESULTS GO HERE, EXACTLY AS BEFORE -->

    </div>
</div>
@endif

We’re doing as we discussed in the section on facetted search; grabbing the facet set, iterating through each item and displaying it along with a count of the number of results for that particular value.

我们正在按照多面搜索这一节中的讨论进行操作; 抓取构面集,遍历每一项并显示该结果以及该特定值的结果数量计数。

Each facet item is a link, which when clicked will refresh the page but with that filter applied. It does this by merging in the appropriate value to the currently “active” set of GET parameters; so if you’ve already filtered on one facet, clicking an item in a different facet-set will maintain that filter by including the appropriate query parameters. It will also maintain your original query, which is set as “q” in the input array.

每个构面项目都是一个链接,单击该链接将刷新页面,但会应用该过滤器。 它通过将适当的值与当前“活动”的GET参数集合进行合并来实现; 因此,如果您已经在一个构面上进行了过滤,则单击其他构面集中的某个项目将通过包含适当的查询参数来维护该过滤条件。 它还将维护您的原始查询,该查询在输入数组中设置为“ q”。

This approach has some limitations – for one thing, there’s no way to “reset” the filters, except to manually alter the query parameters in the address bar – but its aim is to demonstrate using multiple facets. I’ll leave improving it to you as an additional exercise!

这种方法有一些局限性:一方面,除了手动更改地址栏中的查询参数外,无法“重置”过滤器;但其目的是演示如何使用多个方面。 我将把它改进作为一项额外的练习留给您!

翻译自: https://www.sitepoint.com/using-solarium-solr-search-implementation/

 类似资料: