Stanford-NER定制可对软件编程关键字进行分类

卞浩漫

2023-03-14

问题内容：

我是NLP的新手，我使用Stanford NER工具对一些随机文本进行分类，以提取软件编程中使用的特殊关键字。

问题是，我不知道如何对斯坦福大学NER中的分类器和文本注释器进行更改以识别软件编程关键字。例如：

today Java used in different operating systems (Windows, Linux, ..)

分类结果应为：

Java "Programming_Language"
Windows "Operating_System"
Linux "Operating_system"

您能提供有关如何自定义StanfordNER分类器以满足我的需求的帮助吗？

问题答案：

我认为它在Stanford NER常见问题解答部分http://nlp.stanford.edu/software/crf-
faq.shtml#a中
有很好的记录。

步骤如下：

在属性文件中，更改地图以指定如何注释（或结构化）训练数据

地图= word = 0，myfeature = 1，answer = 2

在 src\edu\stanford\nlp\sequences\SeqClassifierFlags.java

添加一个标志，表明您要使用您的新功能，下面将其称为useMyFeature public boolean useLabelSource = false
，添加public boolean useMyFeature = true;

在告诉工具setProperties(Properties props, boolean printProps)后的方法中的同一文件中else if (key.equalsIgnoreCase("useTrainLexicon")) { ..}，此标志是否为您打开/关闭

        else if (key.equalsIgnoreCase("useMyFeature")) {
          useMyFeature= Boolean.parseBoolean(val);
    }

在中src/edu/stanford/nlp/ling/CoreAnnotations.java，添加以下部分

        public static class myfeature implements CoreAnnotation<String> {
      public Class<String> getType() {
        return String.class;
      }
    }

在src/edu/stanford/nlp/ling/AnnotationLookup.java中 public enumKeyLookup{..} 在底部添加

MY_TAG（CoreAnnotations.myfeature.class，“ myfeature”）

在中src\edu\stanford\nlp\ie\NERFeatureFactory.java，根据功能的“类型”添加

        protected Collection<String> featuresC(PaddedList<IN> cInfo, int loc)

    if(flags.useRahulPOSTAGS){
        featuresC.add(c.get(CoreAnnotations.myfeature.class)+"-my_tag");
    }

调试：除此之外，还有一些方法可以将功能部件转储到文件中，并使用它们来查看事情的进展情况。另外，我认为您也必须花一些时间在调试器上：P

Stanford-NER定制可对软件编程关键字进行分类

相关阅读

相关文章

相关问答

相关工具

相关文档