匹配至少有一个单词的字符串

陆仲渊

2023-03-14

问题内容：

我正在查询以获取具有特定标题的文档的URI。我的查询是：

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?document WHERE {
  ?document dc:title ?title.
  FILTER (?title = "…" ).
}

"…"的值实际在哪里this.getTitle()，因为查询字符串是通过以下方式生成的：

String queryString = "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> " +
                "PREFIX dc: <http://purl.org/dc/elements/1.1/> SELECT ?document WHERE { " +
                "?document dc:title ?title." +
                "FILTER (?title = \"" + this.getTitle() + "\" ). }";

通过上面的查询，我仅获得标题与完全相同的文档this.getTitle()。想象一下，this.getTitle是由多个词组成的。我想获得文档，即使this.getTitle文档标题上仅出现一个字形（例如）。我该怎么办？

问题答案：

假设您有一些数据（在Turtle中）：

@prefix : <http://stackoverflow.com/q/20203733/1281433> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .

:a dc:title "Great Gatsby" .
:b dc:title "Boring Gatsby" .
:c dc:title "Great Expectations" .
:d dc:title "The Great Muppet Caper" .

然后，您可以使用类似以下的查询：

prefix : <http://stackoverflow.com/q/20203733/1281433>
prefix dc: <http://purl.org/dc/elements/1.1/>

select ?x ?title where {
  # this is just in place of this.getTitle().  It provides a value for
  # ?TITLE that is "Gatsby Strikes Again".
  values ?TITLE { "Gatsby Strikes Again" }

  # Select a thing and its title.
  ?x dc:title ?title .

  # Then filter based on whether the ?title matches the result
  # of replacing the strings in ?TITLE with "|", and matching
  # case insensitively.
  filter( regex( ?title, replace( ?TITLE, " ", "|" ), "i" ))
}

得到像

------------------------
| x  | title           |
========================
| :b | "Boring Gatsby" |
| :a | "Great Gatsby"  |
------------------------

这样做特别整洁的是，由于您正在动态生成模式，因此甚至可以基于图形模式中的另一个值来进行创建。例如，如果您想要标题至少匹配一个单词的所有事物对，则可以执行以下操作：

prefix : <http://stackoverflow.com/q/20203733/1281433>
prefix dc: <http://purl.org/dc/elements/1.1/>

select ?x ?xtitle ?y ?ytitle where {
  ?x dc:title ?xtitle .
  ?y dc:title ?ytitle .
  filter( regex( ?xtitle, replace( ?ytitle, " ", "|" ), "i" ) && ?x != ?y )
}
order by ?x ?y

要得到：

-----------------------------------------------------------------
| x  | xtitle                   | y  | ytitle                   |
=================================================================
| :a | "Great Gatsby"           | :b | "Boring Gatsby"          |
| :a | "Great Gatsby"           | :c | "Great Expectations"     |
| :a | "Great Gatsby"           | :d | "The Great Muppet Caper" |
| :b | "Boring Gatsby"          | :a | "Great Gatsby"           |
| :c | "Great Expectations"     | :a | "Great Gatsby"           |
| :c | "Great Expectations"     | :d | "The Great Muppet Caper" |
| :d | "The Great Muppet Caper" | :a | "Great Gatsby"           |
| :d | "The Great Muppet Caper" | :c | "Great Expectations"     |
-----------------------------------------------------------------

当然，非常重要的一点是要注意，您现在正在根据数据提取生成模式，这意味着可以将数据放入系统中的人可能会使用非常昂贵的模式来阻止查询并导致拒绝-
服务。更为平凡的是，如果您的任何标题中包含会干扰正则表达式的字符，您都可能会遇到麻烦。一个有趣的问题是，如果某个东西的标题带有多个空格，则该模式变为The|Words|With||Two|Spaces，因为其中的空模式可能使
所有内容 匹配。这是一种有趣的方法，但是有很多警告。

通常，您可以如此处所示执行此操作，或者通过在代码中生成正则表达式（可以在其中转义等），或者可以使用支持某些基于文本的扩展名的SPARQL引擎（例如jena）
-text
，它将Apache
Lucene或Apache Solr添加到Apache Jena）。

匹配至少有一个单词的字符串

相关阅读

相关文章

相关问答

相关工具

相关文档