当前位置: 首页 > 面试题库 >

使用SentiWordNet获取不正确的分数

微生俊健
2023-03-14
问题内容

我正在使用SentiWordNet做一些情绪分析,我在这里提到了如何使用SentiWordNet的帖子。但是,尽管尝试了各种输入,但我仍得到0.0分。我在这里做错什么了吗?谢谢!

    import java.io.BufferedReader;
    import java.io.File;
    import java.io.FileReader;
    import java.util.HashMap;
    import java.util.Iterator;
    import java.util.Set;
    import java.util.Vector;

    public class SWN3 {
        private String pathToSWN = "C:\\Users\\Malcolm\\Desktop\\SentiWordNet_3.0.0\\home\\swn\\www\\admin\\dump\\SentiWordNet_3.0.0.txt";
        private HashMap<String, Double> _dict;

        public SWN3(){

            _dict = new HashMap<String, Double>();
            HashMap<String, Vector<Double>> _temp = new HashMap<String, Vector<Double>>();
            try{
                BufferedReader csv =  new BufferedReader(new FileReader(pathToSWN));
                String line = "";           
                while((line = csv.readLine()) != null)
                {
                    String[] data = line.split("\t");
                    Double score = Double.parseDouble(data[2])-Double.parseDouble(data[3]);
                    String[] words = data[4].split(" ");
                    for(String w:words)
                    {
                        String[] w_n = w.split("#");
                        w_n[0] += "#"+data[0];
                        int index = Integer.parseInt(w_n[1])-1;
                        if(_temp.containsKey(w_n[0]))
                        {
                            Vector<Double> v = _temp.get(w_n[0]);
                            if(index>v.size())
                                for(int i = v.size();i<index; i++)
                                    v.add(0.0);
                            v.add(index, score);
                            _temp.put(w_n[0], v);
                        }
                        else
                        {
                            Vector<Double> v = new Vector<Double>();
                            for(int i = 0;i<index; i++)
                                v.add(0.0);
                            v.add(index, score);
                            _temp.put(w_n[0], v);
                        }
                    }
                }
                Set<String> temp = _temp.keySet();
                for (Iterator<String> iterator = temp.iterator(); iterator.hasNext();) {
                    String word = (String) iterator.next();
                    Vector<Double> v = _temp.get(word);
                    double score = 0.0;
                    double sum = 0.0;
                    for(int i = 0; i < v.size(); i++)
                        score += ((double)1/(double)(i+1))*v.get(i);
                    for(int i = 1; i<=v.size(); i++)
                        sum += (double)1/(double)i;
                    score /= sum;
                    String sent = "";               
                    if(score>=0.75)
                        sent = "strong_positive";
                    else
                    if(score > 0.25 && score<=0.5)
                        sent = "positive";
                    else
                    if(score > 0 && score>=0.25)
                        sent = "weak_positive";
                    else
                    if(score < 0 && score>=-0.25)
                        sent = "weak_negative";
                    else
                    if(score < -0.25 && score>=-0.5)
                        sent = "negative";
                    else
                    if(score<=-0.75)
                        sent = "strong_negative";
                    _dict.put(word, score);
                }
            }
            catch(Exception e){e.printStackTrace();}        
        }

public Double extract(String word)
{
    Double total = new Double(0);
    if(_dict.get(word+"#n") != null)
         total = _dict.get(word+"#n") + total;
    if(_dict.get(word+"#a") != null)
        total = _dict.get(word+"#a") + total;
    if(_dict.get(word+"#r") != null)
        total = _dict.get(word+"#r") + total;
    if(_dict.get(word+"#v") != null)
        total = _dict.get(word+"#v") + total;
    return total;
}

public static void main(String[] args) {
    SWN3 test = new SWN3();
    String sentence="Hello have a Super awesome great day";
    String[] words = sentence.split("\\s+"); 
    double totalScore = 0;
    for(String word : words) {
        word = word.replaceAll("([^a-zA-Z\\s])", "");
        if (test.extract(word) == null)
            continue;
        totalScore += test.extract(word);
    }
    System.out.println(totalScore);
}

}

这是SentiWordNet.txt的前10行

a   00001740    0.125   0   able#1  (usually followed by `to') having the necessary means or skill or know-how or authority to do something; "able to swim"; "she was able to program her computer"; "we were at last able to buy a car"; "able to get a grant for the project"
a   00002098    0   0.75    unable#1    (usually followed by `to') not having the necessary means or skill or know-how; "unable to get to town without a car"; "unable to obtain funds"
a   00002312    0   0   dorsal#2 abaxial#1  facing away from the axis of an organ or organism; "the abaxial surface of a leaf is the underside or side facing away from the stem"
a   00002527    0   0   ventral#2 adaxial#1 nearest to or facing toward the axis of an organ or organism; "the upper side of a leaf is known as the adaxial surface"
a   00002730    0   0   acroscopic#1    facing or on the side toward the apex
a   00002843    0   0   basiscopic#1    facing or on the side toward the base
a   00002956    0   0   abducting#1 abducent#1  especially of muscles; drawing away from the midline of the body or from an adjacent part
a   00003131    0   0   adductive#1 adducting#1 adducent#1  especially of muscles; bringing together or drawing toward the midline of the body or toward an adjacent part
a   00003356    0   0   nascent#1   being born or beginning; "the nascent chicks"; "a nascent insurgency"
a   00003553    0   0   emerging#2 emergent#2   coming into existence; "an emergent republic"

问题答案:

通常SentiWord.txt文件带有奇怪的格式。

您需要删除它的第一部分(包括注释和说明)和最后两行:

#
EMPTY LINE

解析器不知道如何处理这些情况,如果删除这两行,就可以了。



 类似资料:
  • 问题内容: 我是一个新用户,正在尝试此命令。 我得到这个错误 我知道这似乎是一个琐碎的问题,但我坚持下去。 问题答案: 如果编译和安装的代码是 不是 在分支(签出由默认值),但只有在该回购的分支,尝试: 然后重试编译。

  • 这是我的angular2代码。 模板 组件 问题是每次加载的打印值都不一样。我猜这个问题是由于浏览器没有完成加载div。你知道这个的解决办法是什么吗?

  • 请读到最后(我在最后提到console.log) 模型: 收藏: 观点: 在我们的应用程序中。js 服务器输出 我还尝试从模型中删除所有属性防御。还是不行。返回值内容类型为:application/json(已验证),并且是有效的json。 我读过:Backbonejs集合长度总是零 但尽管console.log,显示0长度,也: 不工作! 我还读了Did主干收集自动解析加载的数据 非常感谢 更新

  • 问题内容: 我正在使用Linux和C ++。我有一个大小为210732字节的二进制文件,但是seekg / tellg报告的大小为210728。 我从ls-la获得以下信息,即210732字节: -rw-rw-r– 1个pjs pjs 210732 2月17日10:25 output.osr 并使用以下代码段,我得到210728: 所以我的代码关闭了4个字节。我已经确认使用十六进制编辑器可以正确处

  • 我已经建立了一个实验性的Kafka环境,有3个代理和一个有3个分区的主题,我有一个生产者和一个消费者。我想为特定使用者修改分区的偏移量。我在kafka文档中读到,kafka中的使用者提交/获取API可以提交特定的偏移量或获取使用者读取的最新偏移量。以下是API的链接: 但是,我已经生成了一些消息,我的使用者已经使用了这些消息并输出每个已读消息的偏移量。 如果有人能帮上忙,我将不胜感激。我想知道我的

  • 问题内容: 我希望詹金斯测试代码以供审查。jenkins作业是在推送到refs / for / master时开始的,但它尝试构建origin / master分支,而不是refs / changes / XX / X / X分支。有人知道我做错了吗? 来自詹金斯的日志: Git配置: Gerrit触发器配置: 问题答案: 抱歉,我没有足够仔细地阅读文档。 Git配置中缺少什么: