问题：

JAVAutil。登录中。记录器在循环后多次显示

关飞翼

2023-03-14

我有一个for循环，在块关闭后，cull logger显示一些日志，但不知道在这之后发生了什么，每次调用for循环都有长度，这很奇怪！在这种情况下

log.log(Level.INFO , "Translated Setences from "+countOfTranslated+" / "+sentenses.size()+" successfully");

是在for循环块关闭之后，没有任何意义。如果有人知道什么，分享。看看我的完整代码：

package mehritco.ir.megnatis.institute.reflex.nlp;


import java.util.logging.Level;
import java.util.logging.Logger;

import javax.annotation.Nullable;

import org.json.JSONObject;

import edu.stanford.nlp.coref.data.CorefChain;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.ie.util.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations;

import edu.stanford.nlp.util.CoreMap;
import mehritco.ir.megnatis.institute.reflex.instagram.Application;
import mehritco.ir.megnatis.institute.reflex.instagram.repository.RepositoryTranslate;
import mehritco.ir.megnatis.tools.file.BasicLocation;


import java.util.*;

/**
 * https://stanfordnlp.github.io/CoreNLP/human-languages.html
 * https://stanfordnlp.github.io/stanfordnlp/models.html
 * @author Megnatis
 *
 */
public class TextAnalyzer {
    public static String unTranslatedText = """
    درما در اینجا هستیم . سلام خوبی؟ وقت بخیر چیکار میکنی
تسلیت 
راستی یادت نره بیای . دوست دارم عزیزم
چه میشه کرد؟
اونجایی
            """;
    
    /**
     * @link https://stanfordnlp.github.io/CoreNLP/ssplit.html#sentence-splitting-from-java
     * @param paragraph Max-length is 1024 byte 
     * @return
     */
    public static ArrayList<String> breakParagraphToSentence(String paragraph){
        ArrayList<String> sentenses = new ArrayList<String>();
        Properties props = new Properties();
        
         props.setProperty("annotators", "tokenize, ssplit");
         /**
             * @Link https://stanfordnlp.github.io/CoreNLP/ssplit.html#options
             * Whether to treat newlines as sentence breaks. This property has 3 legal values.
             * “always” means that a newline is always a sentence break
             * (but there still may be multiple sentences per line).
             */
         props.setProperty("ssplit.newlineIsSentenceBreak", "always");
            /**
             * @link https://stanfordnlp.github.io/CoreNLP/tokenize.html
             * Java character offsets (stored in the CharacterOffset{Begin,End}Annotation) 
             * are in terms of 16-bit char’s, with Unicode characters outside the basic 
             * multilingual prime encoded as two chars via a surrogate pair.
             * if need emoji use true .
             */
         props.setProperty("tokenize.codepoint", "true");
         
         StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
         CoreDocument document = new CoreDocument(paragraph);
            pipeline.annotate(document);
            for (CoreSentence sentence : document.sentences()) {
                sentenses.add(sentence.text());//split sentence with newLine char
            }
        return sentenses;
    }
    /**
     * For create Paragraph from TranslatedText
     * @param sentenses
     * @param log
     * @return String translated Paragraph
     */
    @Nullable
    public static String getParagraphInTranslated(ArrayList<String> sentenses , Logger log) {
        //Check null sentence or not
                if(sentenses == null || sentenses.size() < 0 || sentenses.isEmpty()) {
                    return null ;
                }
                String allSentenceToGetter = "";
                int countOfTranslated = 0;
                for (String sentense : sentenses) {
                      RepositoryTranslate repositoryTranslate = new RepositoryTranslate();
                      Logger logger = Application.setupLog(BasicLocation.getBaseFileDir()+"logs");
                      JSONObject translateJson =  repositoryTranslate.translate(sentense, "fa", "en", logger);
                      boolean isTranslated = translateJson.optBoolean(RepositoryTranslate.IS_TRANSLATE);
                      if(isTranslated) {
                          String textFromTranslate = translateJson.optString(RepositoryTranslate.TRANSLATE_TEXT);
                          allSentenceToGetter += (textFromTranslate+"\n") ;
                          countOfTranslated++;
                      }
                }
                
                log.log(Level.INFO , "Translated Setences from "+countOfTranslated+" / "+sentenses.size()+" successfully");
                return allSentenceToGetter;
    }
    /**
     * @param sentenses Use breakParagraphToSentence() to get input
     */
    public static void analyzer(ArrayList<String> sens , Logger log) {
        String paragraphTranslated = null;
        //Check null sentence or not
        if(sens == null || sens.size() < 0 || sens.isEmpty()) {
            return ;
        }else {
            paragraphTranslated = getParagraphInTranslated(sens, log);
        }
        if(paragraphTranslated == null) {
            return;
        }
         Properties properties = new Properties();
         properties.setProperty("annotators", "tokenize, ssplit, pos, lemma, parse, sentiment");
         properties.setProperty("ssplit.newlineIsSentenceBreak", "always");
         properties.setProperty("tokenize.codepoint", "true");
         StanfordCoreNLP pipeline = new StanfordCoreNLP(properties);
        
         Annotation annotation = pipeline.process(paragraphTranslated);
            List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);

           // sentences
            for (CoreMap sentence : sentences) {
              String sentiment = sentence.get(SentimentCoreAnnotations.SentimentClass.class);
              System.out.println(sentiment + "\t" + sentence);
            }
            //Tokenizing process https://stanfordnlp.github.io/CoreNLP/tokenize.html
            CoreDocument doc = new CoreDocument(unTranslatedText);
            pipeline.annotate(doc);
//          for (CoreMap sentence : sentences) {
//            // Get the parse tree for each sentence
//            Tree parseTree = sentence.get(TreeCoreAnnotations.TreeAnnotation.class);
//            // Do something interesting with the parse tree!
//            System.out.println(parseTree);
//          }
            for (CoreLabel tok : doc.tokens()) {
                System.out.println(String.format("%s\t%d\t%d", tok.word(), tok.beginPosition(), tok.endPosition()));
              }
        
    }
    
    

      public static void main(String[] args)  {
          Logger logger = Application.setupLog(BasicLocation.getBaseFileDir()+"logs");
          analyzer(breakParagraphToSentence(unTranslatedText), logger);


        }

    }

输出为：


[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
 -> 2022/05/04 07:40:47.140 {INFO} [mehritco.ir.megnatis.institute.reflex.nlp.TextAnalyzer At Method :getParagraphInTranslated (Line :0) ]  : Translated Setences from 7 / 7 successfully
 -> 2022/05/04 07:40:47.150 {INFO} [mehritco.ir.megnatis.institute.reflex.nlp.TextAnalyzer At Method :getParagraphInTranslated (Line :0) ]  : Translated Setences from 7 / 7 successfully
 -> 2022/05/04 07:40:47.155 {INFO} [mehritco.ir.megnatis.institute.reflex.nlp.TextAnalyzer At Method :getParagraphInTranslated (Line :0) ]  : Translated Setences from 7 / 7 successfully
 -> 2022/05/04 07:40:47.162 {INFO} [mehritco.ir.megnatis.institute.reflex.nlp.TextAnalyzer At Method :getParagraphInTranslated (Line :0) ]  : Translated Setences from 7 / 7 successfully
 -> 2022/05/04 07:40:47.167 {INFO} [mehritco.ir.megnatis.institute.reflex.nlp.TextAnalyzer At Method :getParagraphInTranslated (Line :0) ]  : Translated Setences from 7 / 7 successfully
 -> 2022/05/04 07:40:47.172 {INFO} [mehritco.ir.megnatis.institute.reflex.nlp.TextAnalyzer At Method :getParagraphInTranslated (Line :0) ]  : Translated Setences from 7 / 7 successfully
 -> 2022/05/04 07:40:47.177 {INFO} [mehritco.ir.megnatis.institute.reflex.nlp.TextAnalyzer At Method :getParagraphInTranslated (Line :0) ]  : Translated Setences from 7 / 7 successfully
 -> 2022/05/04 07:40:47.182 {INFO} [mehritco.ir.megnatis.institute.reflex.nlp.TextAnalyzer At Method :getParagraphInTranslated (Line :0) ]  : Translated Setences from 7 / 7 successfully
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
[main] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words-distsim.tagger ... done [0.7 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.7 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator sentiment
[main] INFO edu.stanford.nlp.sentiment.SentimentModel - Loading sentiment model edu/stanford/nlp/models/sentiment/sentiment.ser.gz ... done [0.1 sec].
Neutral We are here.
Neutral Hi how are you?
Neutral Good morning, what are you doing?
Neutral condolences
Negative    I really do not remember.
Positive    I love you baby
Neutral What can be done?
Neutral There
درما    1   5
در  6   8
اینجا   9   14
هستیم   15  20
.   21  22
سلام    23  27
خوبی    28  32
؟   32  33
وقت 34  37
بخیر    38  42
چیکار   43  48
میکنی   49  54
تسلیت   55  60
راستی   61  66
یادت    67  71
نره 72  75
بیای    76  80
.   81  82
دوست    83  87
دارم    88  92
عزیزم   93  98
چه  99  101
میشه    102 106
کرد 107 110
؟   110 111
اونجایی 112 119

输出结果必须如下所示：


[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
 -> 2022/05/04 07:40:47.140 {INFO} [mehritco.ir.megnatis.institute.reflex.nlp.TextAnalyzer At Method :getParagraphInTranslated (Line :0) ]  : Translated Setences from 7 / 7 successfully
successfully
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
[main] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words-distsim.tagger ... done [0.7 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [0.7 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator sentiment
[main] INFO edu.stanford.nlp.sentiment.SentimentModel - Loading sentiment model edu/stanford/nlp/models/sentiment/sentiment.ser.gz ... done [0.1 sec].
Neutral We are here.
Neutral Hi how are you?
Neutral Good morning, what are you doing?
Neutral condolences
Negative    I really do not remember.
Positive    I love you baby
Neutral What can be done?
Neutral There
درما    1   5
در  6   8
اینجا   9   14
هستیم   15  20
.   21  22
سلام    23  27
خوبی    28  32
؟   32  33
وقت 34  37
بخیر    38  42
چیکار   43  48
میکنی   49  54
تسلیت   55  60
راستی   61  66
یادت    67  71
نره 72  75
بیای    76  80
.   81  82
دوست    83  87
دارم    88  92
عزیزم   93  98
چه  99  101
میشه    102 106
کرد 107 110
؟   110 111
اونجایی 112 119

微生曾琪

2023-03-14

很明显这条线

    paragraphTranslated = getParagraphInTranslated(sens, log);

被执行8次。

这可能会发生，如果

文本包含8个段落。
您的代码在段落或句子中进行的拆分中普遍存在一些错误。在这种情况下，您会在调试中找到它。

JAVAutil。登录中。记录器在循环后多次显示

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档