我正在尝试使用最大熵分类器的OpenNLP实现,但似乎文档非常缺乏,尽管这个库显然是为易于使用而设计的,但我找不到输入文件格式(即训练集)的单一示例和/或规范。
有人知道在哪里可以找到这个或一个最简单的培训示例吗?
OpenNLP的格式非常灵活。如果想在OpenNLP中使用MaxEnt分类器,需要执行几个步骤。
这是带有注释的示例代码:
package example;
import java.io.File;
import java.io.IOException;
import java.nio.charset.Charset;
import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;
import opennlp.tools.ml.maxent.GISTrainer;
import opennlp.tools.ml.model.Event;
import opennlp.tools.ml.model.MaxentModel;
import opennlp.tools.tokenize.WhitespaceTokenizer;
import opennlp.tools.util.FilterObjectStream;
import opennlp.tools.util.MarkableFileInputStreamFactory;
import opennlp.tools.util.ObjectStream;
import opennlp.tools.util.PlainTextByLineStream;
import opennlp.tools.util.TrainingParameters;
public class ReadData {
public static void main(String[] args) throws Exception{
// this is the data file ...
// the format is <LIST of FEATURES separated by spaces> <outcome>
// change the file to fit your needs
File f=new File("football.dat");
// we need to create an ObjectStream of events for the trainer..
// First create an InputStreamFactory -- given a file we can create an InputStream, required for resetting...
MarkableFileInputStreamFactory factory=new MarkableFileInputStreamFactory(f);
// create a PlainTextByLineInputStream -- Note: you can create your own Stream that can handle binary files or data that
// -- crosses two line...
ObjectStream<String> stream=new PlainTextByLineStream(factory, Charset.defaultCharset());
// Now you have a stream of string you need to convert it to a stream of events...
// I use a custom FilterObjectStream which simply takes a line, breaks it up into tokens,
// uses all except the last as the features [context] and the last token as the outcome class
ObjectStream<Event> eventStream=new FilterObjectStream<String, Event>(stream) {
@Override
public Event read() throws IOException {
String line=samples.read();
if (line==null) return null;
String[] parts=WhitespaceTokenizer.INSTANCE.tokenize(line);
String[] context=Arrays.copyOf(parts, parts.length-1);
System.out.println(parts[parts.length-1]+" "+Arrays.toString(context));
return new Event(parts[parts.length-1], context);
}
};
TrainingParameters parameters=new TrainingParameters();
// By default OpenNLP uses a cutoff of 5 (a feature has to occur 5 times before it is used)
// use 1 for my small dataset
parameters.put(GISTrainer.CUTOFF_PARAM, 1);
GISTrainer trainer=new GISTrainer();
// the report map is supposed to mark when default values are assigned...
Map<String,String> reportMap=new HashMap<>();
// DONT FORGET TO INITIALIZE THE TRAINER!!!
trainer.init(parameters, reportMap);
MaxentModel model=trainer.train(eventStream);
// Now we have a model -- you should test on a test set, but
// this is a toy example... so I am just resetting the eventstream.
eventStream.reset();
Event evt=null;
while ( (evt=eventStream.read())!=null ){
System.out.print(Arrays.toString(evt.getContext())+": ");
// Evaluate the context from the event using our model.
// you would want to calculate summary statistics..
double[] p=model.eval(evt.getContext());
System.out.print(model.getBestOutcome(p)+" ");
if (model.getBestOutcome(p).equals(evt.getOutcome())){
System.out.println("CORRECT");
}else{
System.out.println("INCORRECT");
}
}
}
}
足球运动dat:
home=man_united Beckham=false Scholes=true Neville=true Henry=true Kanu=true Parlour=false Ferguson=confident Wengler=tense arsenal_lost_previous man_united_won_previous arsenal
home=man_united Beckham=true Scholes=false Neville=true Henry=false Kanu=true Parlour=false Ferguson=tense Wengler=confident arsenal_won_previous man_united_lost_previous man_united
home=man_united Beckham=false Scholes=true Neville=true Henry=true Kanu=true Parlour=false Ferguson=tense Wengler=tense arsenal_lost_previous man_united_won_previous tie
home=man_united Beckham=true Scholes=true Neville=false Henry=true Kanu=false Parlour=false Ferguson=confident Wengler=confident arsenal_won_previous man_united_won_previous tie
home=man_united Beckham=false Scholes=true Neville=true Henry=true Kanu=true Parlour=false Ferguson=confident Wengler=tense arsenal_won_previous man_united_won_previous arsenal
home=man_united Beckham=false Scholes=true Neville=true Henry=false Kanu=true Parlour=false Ferguson=confident Wengler=confident arsenal_won_previous man_united_won_previous man_united
home=man_united Beckham=true Scholes=true Neville=false Henry=true Kanu=true Parlour=false Ferguson=confident Wengler=tense arsenal_won_previous man_united_won_previous man_united
home=arsenal Beckham=false Scholes=true Neville=true Henry=true Kanu=true Parlour=false Ferguson=confident Wengler=tense arsenal_lost_previous man_united_won_previous arsenal
home=arsenal Beckham=true Scholes=false Neville=true Henry=false Kanu=true Parlour=false Ferguson=tense Wengler=confident arsenal_won_previous man_united_lost_previous arsenal
home=arsenal Beckham=false Scholes=true Neville=true Henry=true Kanu=true Parlour=false Ferguson=tense Wengler=tense arsenal_lost_previous man_united_won_previous tie
home=arsenal Beckham=true Scholes=true Neville=false Henry=true Kanu=false Parlour=false Ferguson=confident Wengler=confident arsenal_won_previous man_united_won_previous man_united
home=arsenal Beckham=false Scholes=true Neville=true Henry=true Kanu=true Parlour=false Ferguson=confident Wengler=tense arsenal_won_previous man_united_won_previous arsenal
home=arsenal Beckham=false Scholes=true Neville=true Henry=false Kanu=true Parlour=false Ferguson=confident Wengler=confident arsenal_won_previous man_united_won_previous man_united
home=arsenal Beckham=true Scholes=true Neville=false Henry=true Kanu=true Parlour=false Ferguson=confident Wengler=tense arsenal_won_previous man_united_won_previous arsenal
希望有帮助
本文向大家介绍jQuery实现简单的日期输入格式化控件,包括了jQuery实现简单的日期输入格式化控件的使用技巧和注意事项,需要的朋友参考一下 js代码有一百多行。 先上效果图 html代码 日期: <input type="text" id="dateInputer" class="hhm-dateInputer" placeholder="请输入日期"> 设置input元素类名为 hhm-d
本文向大家介绍vue.js实现只能输入数字的输入框,包括了vue.js实现只能输入数字的输入框的使用技巧和注意事项,需要的朋友参考一下 在菜鸟教程里,看了vue.js的教程,看完后,练练手,就试着实现了只能输入数字的输入框。在之前的博客里,用jquery也实现了这样的功能,这里用vue.js来实现,把实现的过程记录下来 如果只是一个输入框,要实现就非常的简单了,输入框的内容和数据绑定,给数据加一个
请问 Vue 可以做到 input 显示 2位小数,但实际是5位小数值,像excel? 举例: excel可以在单元格输入 1.23456,显示时只会显示 1.23 二位小数,但运算时还是使用 1.23456 去计算 目前能在 vue change 时用 round 近位到 2 位小数,但是实际值也被更改成 1.23
我正在使用以下依赖项
本文向大家介绍winform 实现控制输入法,包括了winform 实现控制输入法的使用技巧和注意事项,需要的朋友参考一下 这里文章写出来并不是为了炫耀什么,只是觉得发现些好东西就分享出来而已,同时也做个记录,方便以后查找 开始正文 1、先介绍本文会用到的windows的API,网上有很详细的资料,我这里就只简要说明一下 ImmGetContext(IntPtr hwnd):获取当前正在输入的
本文向大家介绍Node.js用readline模块实现输入输出,包括了Node.js用readline模块实现输入输出的使用技巧和注意事项,需要的朋友参考一下 什么是Readline Readline是Node.js里实现标准输入输出的封装好的模块,通过这个模块我们可以以逐行的方式读取数据流。使用require("readline")可以引用模块。 如何使用Readline 以使用为角度的话,学习