当前位置: 首页 > 知识库问答 >
问题:

如何从mysql表中获得单词组合,将上一行和下一行中的单词分组?

和嘉澍
2023-03-14

我想得到这种方法的更复杂版本。我的目标只是从表中获取1-n个单词组合,我在其中按行存储了单个单词。单词短语应该根据id顺序构建,而不仅仅是随机顺序。

目前我使用mysql变量,但当我尝试获得大于12个单词的组合时,问题来了,因为mysql在内部使用的连接表有一个限制。此外,我还没有想出如何一次获得所有组合。我在结果中只得到“word1 word2”、“word3 word4”或“word1 word2 word3”、“word4 word5 word6”,而不是“word1 word2”、“word1 word2 word3”、“word1 word2 word3 word4”等。

SELECT  (@word3 := CONCAT(@word3, ' ', w.word)) AS word3,
    (@word2 := CONCAT(@word2, ' ', w.word)) AS word2,
    (@word3 := @word2) _word3,
    (@word2 := w.word) _word2,
    w.book_id

FROM    (SELECT @word3:='') _word3,
    (SELECT @word2:='') _word2,
    word w, text t, chapter c, verse v

WHERE   v.number IN (14) AND c.number IN (21) AND c.book_id IN (1) AND t.verse_id = v.id AND w.id = t.word_id AND v.chapter_id = c.id

章节、诗句和文本用于显示更大的上下文,但基本上单词表是:

id, 
word, 
numeric_value, 
transliteration, 
translation, 
words_count, 
book_id

我认为连接表限制是因为我还想连接音译、字数和其他字段。

我也在select上试用过:

if(@word3, @word3 := CONCAT(@word3, ' ', w.word), @word3 := '') AS word3,

if(@word2, @word2 := CONCAT(@word2, ' ', w.word), @word2 := '') AS word2,

但由于某种原因,它没有起作用。

那么,从表格中获取相关单词组合的最佳方法是什么,其中单词位于单独的行上?

this
is
a
phrase
containing
words
on
separate
rows
this is, is a, a phrase, phrase containing, containing words, words on, 
on separate, separate rows, this is a, is a phrase, a phrase containing, 
phrase containing words, containing words on, words on separate, 
on separate rows, ...

以此类推,使用4到n个单词组合。

共有2个答案

杨骏
2023-03-14

此查询基于以下方法将CSV值解析为多行:http://www.sqlteam.com/article/parsing-csv-values-into-multiple-rows

  1. 它将从字符串字段中提取每个单词

前向表

pos

1
2
3
4
5
6
7
.
.
.
1000

文字表

row_id   phrase

1        an example of phrase to tag with many words
2        hello world!
3        another phrase 1000 characters long

MySQL查询

SELECT 
DISTINCT
 @col1_len:=IF(num=1,@col2_len,@col1_len) AS col1_len
,@col1:=IF(num=1,@col2,@col1) AS col1
,@col2_len:=IF(num=1,@col3_len,@col2_len) AS col2_len
,@col2:=IF(num=1,@col3,@col2) AS col2
,@col3_len:=IF(num=1,@col4_len,@col3_len) AS col3_len
,@col3:=IF(num=1,@col4,@col3) AS col3
,@col4_len:=IF(num=1,LOCATE(' ',content, f.pos ) - f.pos,@col4_len) AS col4_len
,@col4:=SUBSTRING(content, f.pos,@col4_len) AS col4
,CASE
WHEN f.num=1
THEN 
   @tag:=@col4
WHEN f.num=2
THEN
   @tag:=TRIM(CONCAT(@col3,' ',@col4))
WHEN f.num=3
THEN 
   @tag:=TRIM(CONCAT(@col2,' ',@col3,' ',@col4))
WHEN f.num=4
THEN 
   @tag:=TRIM(CONCAT(@col1,' ',@col2,' ',@col3,' ',@col4))
END AS tag
,CASE
WHEN f.num=1
THEN 
   @valid_tag:=IF(@col4_len > 2,1,0)
WHEN f.num=2
THEN
   @valid_tag:=IF(@col3_len > 2,1,0)
WHEN f.num=3
THEN 
   @valid_tag:=IF(@col2_len > 2,1,0)
WHEN f.num=4
THEN 
   @valid_tag:=IF(@col1_len > 2,1,0)
END AS valid_tag
FROM 
(SELECT 
CONCAT(' ' ,phrase, ' ') AS content
,f1.pos AS pos
,f2.pos AS num 
,@col1:=''
,@col2:=''
,@col3:=''
,@col4:=''
,@col1_len:=0
,@col2_len:=0
,@col3_len:=0
,@col4_len:=0
FROM words
LEFT JOIN foreach f1
ON ( f1.pos < content_len+1 )
LEFT JOIN foreach f2 
ON ( f2.pos < 5 )
WHERE 1=1
AND row_id IN (1) 
)f 
WHERE 1=1
AND SUBSTRING(content, f.pos - 1, 1)=' '
HAVING 
LENGTH(tag)>2
AND valid_tag=1
AND col4_len > 2

示例输出

tag

example
example of phrase
phrase
phrase to tag
phrase to tag with
tag
tag with
tag with many
tag with many words
with
with many
with many words
many
many words
words
宗政唯
2023-03-14

对于当前模式,您可以使用GROUP\u CONCAT():

SELECT   GROUP_CONCAT(w.word)
FROM     chapter c
    JOIN verse   v ON v.chapter_id = c.id
    JOIN text    t ON t.verse_id   = v.id
    JOIN word    w ON w.id         = t.word_id
WHERE    c.book_id = 1
     AND c.number  = 21
     AND v.number  = 14
GROUP BY c.book_id, c.number, v.number

但是,这似乎有点过度规范化;如果合适,为什么不将完整文本存储在text表中,并使用FULLTEXT索引?

 类似资料: