问题：

Stanford NLP令牌Regex--不识别NER

朱经武

2023-03-14

我从几个网络搜索中拼凑出以下内容。我可以让简单的Java regex/john/与之匹配，但在使用NER时，我没有尝试过任何匹配（所有这些都是从web搜索中复制来的示例，并稍微调整了一下）。

为了清晰起见，请编辑：（下面代码中的matcher2.matches()中的成功/失败为true/false。）

我不知道我是否需要明确地提到某个模型或注释或其他东西，或者我是否遗漏了其他东西，或者我是否只是以完全错误的方式处理它。

import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.ling.tokensregex.TokenSequenceMatcher;
import edu.stanford.nlp.ling.tokensregex.TokenSequencePattern;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.util.CoreMap;
import java.util.ArrayList;
import java.util.List;
import java.util.Properties;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.junit.Test;

public class StanfordSandboxTest {
    private static final Log log = LogFactory.getLog(StanfordSandboxTest.class);

    @Test
    public void testFirstAttempt() {

        Properties props2;
        StanfordCoreNLP pipeline2;
        TokenSequencePattern pattern2;
        Annotation document2;
        List<CoreMap> sentences2;
        TokenSequenceMatcher matcher2;
        String text2;

        props2 = new Properties();
        props2.put("annotators", "tokenize, ssplit, pos, lemma, ner, regexner, parse, dcoref");
        pipeline2 = new StanfordCoreNLP(props2);
        text2 = "March 1, 1999";
        pattern2 = TokenSequencePattern.compile("pattern: (([{ner:DATE}])");
        document2 = new Annotation(text2);
        pipeline2.annotate(document2);
        sentences2 = document2.get(CoreAnnotations.SentencesAnnotation.class);
        matcher2 = pattern2.getMatcher(sentences2);
        log.info("testFirstAttempt: Matches2: " + matcher2.matches());

        props2 = new Properties();
        props2.put("annotators", "tokenize, ssplit, pos, lemma, ner, regexner, parse, dcoref");
        pipeline2 = new StanfordCoreNLP(props2);
        text2 = "John";
        pattern2 = TokenSequencePattern.compile("/John/");
        document2 = new Annotation(text2);
        pipeline2.annotate(document2);
        sentences2 = document2.get(CoreAnnotations.SentencesAnnotation.class);
        matcher2 = pattern2.getMatcher(sentences2);
        log.info("testFirstAttempt: Matches2: " + matcher2.matches());
    }
}

共有1个答案

顾学真

2023-03-14

示例代码：

package edu.stanford.nlp.examples;

import edu.stanford.nlp.util.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;

import java.util.*;


public class TokensRegexExampleTwo {

  public static void main(String[] args) {

    // set up properties
    Properties props = new Properties();
    props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,tokensregex");
    props.setProperty("tokensregex.rules", "multi-step-per-org.rules");
    props.setProperty("tokensregex.caseInsensitive", "true");

    // set up pipeline
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

    // set up text to annotate
    Annotation annotation = new Annotation("Joe Smith works for Apple Inc.");

    // annotate text
    pipeline.annotate(annotation);

    // print out found entities
    for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) {
      for (CoreLabel token : sentence.get(CoreAnnotations.TokensAnnotation.class)) {
        System.out.println(token.word() + "\t" + token.ner());
      }
    }
  }
}

示例规则文件：

ner = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$NamedEntityTagAnnotation" }

$ORGANIZATION_TITLES = "/inc\.|corp\./"

$COMPANY_INDICATOR_WORDS = "/company|corporation/"

ENV.defaults["stage"] = 1

{ pattern: (/works/ /for/ ([{pos: NNP}]+ $ORGANIZATION_TITLES)), action: (Annotate($1, ner, "RULE_FOUND_ORG") ) }

ENV.defaults["stage"] = 2

{ pattern: (([{pos: NNP}]+) /works/ /for/ [{ner: "RULE_FOUND_ORG"}]), action: (Annotate($1, ner, "RULE_FOUND_PERS") ) }

这将对“Joe Smith”和“Apple Inc.”应用NER标记。您可以根据您的具体情况调整这一点。请让我知道，如果你想做一些更先进的东西，而不仅仅是应用NER标签。注意：确保将这些规则放在一个名为“multi-step-per-org.rules”的文件中。

类似资料：

StanfordNLP：用于命名实体识别的ArrayIndexOutOfBoundsException

我正在尝试使用这个简短的实体识别教程来学习NER。但我无法成功运行代码。我在现场提供了一个入口。这里提到的txt文件。我收到错误。请帮帮我。先谢谢你。
车牌识别

1.1. cirtus_lpr_sdk 1.1.1. SDK接口说明 1.2. android_demo Rokid Plate Recognition SDK and demo project. Author Email cmxnono cmxnono@rokid.com 1.1. cirtus_lpr_sdk Version：1.0 1.1.1. SDK接口说明初始化 public long
Spring Boot com.fasterxml.jackson.core.jsonParseException：无法识别的令牌

我使用一个Spring Boot项目，我希望在程序启动后执行多个请求，而不是在程序启动后手动使用cURL。其目的是在存储器中存储一些数据，并使平台为进一步的操作做好准备。我使用的原始命令（运行良好），如何在此方案中为POST调用正确设置媒体数据？
jenkins不识别命令sh？

问题内容：我一直在尝试让Jenkinsfile工作时遇到很多麻烦。我一直在尝试运行以下测试脚本：但是在尝试构建时，我总是收到此错误：我将所有管道插件更新为最新版本，但仍然遇到此错误。有什么帮助吗？问题答案：看来原因是全球财产造成了这个问题。通过转到-> 并删除全局属性，解决了我的问题。参见JENKINS-41339。
lexer和parser的ANTLR令牌识别错误

我正在编写一个ANTLR Lexer和解析器语法，它将解析与Java类非常相似的文本。最终，它将解析如下所示的文本：我正在慢慢地构建Lexer和Parser。我已经成功地解析了，但是在解析时遇到了困难。在添加对的支持之前，我能够在解析器中为空格、冒号和分号使用字符串文字，但在我遇到错误创建隐式标记。我为这些字符中的每一个定义了一个词法分析器规则，并用该规则替换了所有出现的文字。但是，这破坏了s
HttpMessageNotReadableException：JSON分析错误：无法识别的令牌''

我通过RestTemplate调用一个endpoint，如下所示：我已经验证了对象中的JSON字符串是有效的，方法是将其复制并在对同一endpoint的cURL请求中使用，没有任何错误。在此请求中也使用了相同的头和授权令牌。当我执行POST时，返回以下错误：我的和头都设置为。通过检查来自cURL的输出，我看到响应体中没有汉字。响应标头如下：当我将设置为或时发出请求，响应是中文字符：我希

Stanford NLP令牌Regex--不识别NER

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档