当前位置: 首页 > 工具软件 > OpenNLP > 使用案例 >

java opennlp_如何在Java中使用OpenNLP?

方飞鸣
2023-12-01

小编典典

这是我放在一起的一些(旧)示例代码,以及随后的现代化代码:

package opennlp;

import opennlp.tools.cmdline.PerformanceMonitor;

import opennlp.tools.cmdline.postag.POSModelLoader;

import opennlp.tools.postag.POSModel;

import opennlp.tools.postag.POSSample;

import opennlp.tools.postag.POSTaggerME;

import opennlp.tools.tokenize.WhitespaceTokenizer;

import opennlp.tools.util.ObjectStream;

import opennlp.tools.util.PlainTextByLineStream;

import java.io.File;

import java.io.IOException;

import java.io.StringReader;

public class OpenNlpTest {

public static void main(String[] args) throws IOException {

POSModel model = new POSModelLoader().load(new File("en-pos-maxent.bin"));

PerformanceMonitor perfMon = new PerformanceMonitor(System.err, "sent");

POSTaggerME tagger = new POSTaggerME(model);

String input = "Can anyone help me dig through OpenNLP's horrible documentation?";

ObjectStream lineStream =

new PlainTextByLineStream(new StringReader(input));

perfMon.start();

String line;

while ((line = lineStream.read()) != null) {

String whitespaceTokenizerLine[] = WhitespaceTokenizer.INSTANCE.tokenize(line);

String[] tags = tagger.tag(whitespaceTokenizerLine);

POSSample sample = new POSSample(whitespaceTokenizerLine, tags);

System.out.println(sample.toString());

perfMon.incrementCounter();

}

perfMon.stopAndPrintFinalResult();

}

}

输出为:

Loading POS Tagger model ... done (2.045s)

Can_MD anyone_NN help_VB me_PRP dig_VB through_IN OpenNLP's_NNP horrible_JJ documentation?_NN

Average: 76.9 sent/s

Total: 1 sent

Runtime: 0.013s

这基本上是从OpenNLP附带的POSTaggerTool类开始的。的sample.getTags()是一个String具有标签类型本身阵列。

这需要直接访问培训数据,这确实非常la脚。

为此,更新的代码库有些不同(并且可能更有用)。

首先,一个Maven POM:

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

4.0.0

org.javachannel

opennlp-example

1.0-SNAPSHOT

org.apache.opennlp

opennlp-tools

1.6.0

org.testng

testng

[6.8.21,)

test

org.apache.maven.plugins

maven-compiler-plugin

3.1

1.8

1.8

这是作为测试编写的代码,因此位于./src/test/java/org/javachannel/opennlp/example:

package org.javachannel.opennlp.example;

import opennlp.tools.cmdline.PerformanceMonitor;

import opennlp.tools.postag.POSModel;

import opennlp.tools.postag.POSSample;

import opennlp.tools.postag.POSTaggerME;

import opennlp.tools.tokenize.WhitespaceTokenizer;

import org.testng.annotations.DataProvider;

import org.testng.annotations.Test;

import java.io.File;

import java.io.FileOutputStream;

import java.io.IOException;

import java.net.URL;

import java.nio.channels.Channels;

import java.nio.channels.ReadableByteChannel;

import java.util.stream.Stream;

public class POSTest {

private void download(String url, File destination) throws IOException {

URL website = new URL(url);

ReadableByteChannel rbc = Channels.newChannel(website.openStream());

FileOutputStream fos = new FileOutputStream(destination);

fos.getChannel().transferFrom(rbc, 0, Long.MAX_VALUE);

}

@DataProvider

Object[][] getCorpusData() {

return new Object[][][]{{{

"Can anyone help me dig through OpenNLP's horrible documentation?"

}}};

}

@Test(dataProvider = "getCorpusData")

public void showPOS(Object[] input) throws IOException {

File modelFile = new File("en-pos-maxent.bin");

if (!modelFile.exists()) {

System.out.println("Downloading model.");

download("http://opennlp.sourceforge.net/models-1.5/en-pos-maxent.bin", modelFile);

}

POSModel model = new POSModel(modelFile);

PerformanceMonitor perfMon = new PerformanceMonitor(System.err, "sent");

POSTaggerME tagger = new POSTaggerME(model);

perfMon.start();

Stream.of(input).map(line -> {

String whitespaceTokenizerLine[] = WhitespaceTokenizer.INSTANCE.tokenize(line.toString());

String[] tags = tagger.tag(whitespaceTokenizerLine);

POSSample sample = new POSSample(whitespaceTokenizerLine, tags);

perfMon.incrementCounter();

return sample.toString();

}).forEach(System.out::println);

perfMon.stopAndPrintFinalResult();

}

}

这段代码实际上并没有 进行 任何 测试

-它是冒烟测试(如果有的话)-但它应该作为起点。另一个(可能)不错的事情是,如果您尚未下载模型,它会为您下载模型。

2020-11-01

 类似资料: