匹配POS标签和单词序列

陈瀚

2023-03-14

问题内容：

我有以下两个带有POS标签的字符串：

Sent1 ：“ 类似作家专业或 词组工作方式的 东西真的很酷。 ”

[（’something’，’NN’），（’like’，’IN’），（’how’，’WRB’），（’writer’，’NN’），（’pro’，’NN’）
，（或），（CC），（短语学，NN），（作品，NNS），（would，MD），（be，VB）
，（’really’，’RB’），（’cool’，’JJ’），（’。’，’。’）]

Sent2 ：“ 像语法编辑器这样的更多选项会很好 ”

[（’more’，’JJR’），（’options’，’NNS’），（’like’，’IN’），（’the’，’DT’），（’syntax’，’NN’）
，（’editor’，’NN’），（’would’，’MD’），（’be’，’VB’），（’nice’，’JJ’）]

我正在寻找一种方法来检测（返回True）是否存在以下序列：这些字符串中的“ would” + be” +形容词（无论形容词的位置如何，只要其位于“
would”“ be”之后）在第二个字符串中，形容词“ nice”紧跟在“ would be”之后，但在第一个字符串中不是这样。

琐碎的情况（形容词前没有其他词； “会很好”）
在我的较早问题中得到了解决：检测POS标签模式以及指定的词

我现在正在寻找一种更通用的解决方案，其中在形容词之前可以出现可选单词。我是NLTK和Python的新手。

问题答案：

首先nltk_cli按照说明安装：https :
//github.com/alvations/nltk_cli

然后，这是中的一个秘密函数nltk_cli，也许您会发现它很有用：

alvas@ubi:~/git/nltk_cli$ cat infile.txt 
something like how writer pro or phraseology works would be really cool .
more options like the syntax editor would be nice
alvas@ubi:~/git/nltk_cli$ python senna.py --chunk2 VP+ADJP infile.txt 
would be    really cool
would be    nice

为了说明其他可能的用法：

alvas@ubi:~/git/nltk_cli$ python senna.py --chunk2 VP+VP infile.txt 
!!! NO CHUNK of VP+VP in this sentence !!!
!!! NO CHUNK of VP+VP in this sentence !!!
alvas@ubi:~/git/nltk_cli$ python senna.py --chunk2 NP+VP infile.txt 
how writer pro or phraseology works would be
the syntax editor   would be
alvas@ubi:~/git/nltk_cli$ python senna.py --chunk2 VP+NP infile.txt 
!!! NO CHUNK of VP+NP in this sentence !!!
!!! NO CHUNK of VP+NP in this sentence !!!

然后，如果您要检查句子中的短语并输出True / False，只需读取并遍历输出nltk_cli并检查if-else条件即可。

匹配POS标签和单词序列

相关阅读

相关文章

相关问答

相关工具

相关文档