问题：

带StanfordCoreNLP的Piglatin jodatime错误

沈永新

2023-03-14

我试图创建一个Pig UDF，它使用通过sista Scala API接口的Stanford CoreNLP包提取tweet中提到的位置。它在本地使用“SBT run”运行时工作良好，但在从PIG调用时抛出“java.lang.NosuchMethodError”异常：

从标签器edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words/english-left3words/english-left3words-distsim.tagger从edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words/english-left3words/english-left3words-distsim.tagger从从edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz加载分类器...2013-06-14 10:48:02,108[低内存检测器]INFO org.apache.pig.impl.util.spillablememorymanager-第一个内存处理程序调用-收集阈值init=18546688(18112k)使用=358671232(350264k)提交=366542848(357952k)max=699072512(682688k)完成[5.0秒]。从edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz加载分类器...2013-06-14 10:48:10,522[低内存检测器]INFO org.apache.pig.impl.util.spillablememorymanager-第一个内存处理程序调用-使用阈值init=18546688(18112k)使用=590012928(576184k)提交=597786624(583776k)max=699072512(682688k)完成[5.6秒]。2013-06-14 10:48:11,469[Thread-11]警告org.apache.hadoop.mapred.localjobrunner-job_local_0001 java.lang.nosuchmethoderror：org.joda.time.duration.compareto（lorg/joda/time/readableduration；）I在edu.stanford.nlp.time.sutime$duration.compareto（sutime.java:3406)，在edu.stanford.nlp.time.sutime$duration.max（stanford.nlp.time.sutime$range.(sutime.java:3793)位于edu.stanford.nlp.time.sutime.(sutime.java:570)

以下是相关代码：

object CountryTokenizer {
  def tokenize(text: String): String = {
    val locations = TweetEntityExtractor.NERLocationFilter(text)
    println(locations)
    locations.map(x => Cities.country(x)).flatten.mkString(" ")
  }
}

class PigCountryTokenizer extends EvalFunc[String] {
  override def exec(tuple: Tuple): java.lang.String = {
    val text: java.lang.String = Util.cast[java.lang.String](tuple.get(0))
    CountryTokenizer.tokenize(text)
  }
}

object TweetEntityExtractor {
    val processor:Processor = new CoreNLPProcessor()


    def NERLocationFilter(text: String): List[String] =  {
        val doc = processor.mkDocument(text)

        processor.tagPartsOfSpeech(doc)
        processor.lemmatize(doc)
        processor.recognizeNamedEntities(doc)

        val locations = doc.sentences.map(sentence => {
            val entities = sentence.entities.map(List.fromArray(_)) match {
                case Some(l) => l
                case _ => List()
            }
            val words = List.fromArray(sentence.words)

            (words zip entities).filter(x => {
                x._1 != "" && x._2 == "LOCATION" 
            }).map(_._1)
        })
        List.fromArray(locations).flatten
    }
}

我正在使用sbt-assembly来构造一个fat-jar，因此joda-time jar文件应该是可以访问的。怎么回事？

共有1个答案

糜单弓

2023-03-14

Pig附带了自己的joda-time(1.6)，该版本与2.x不兼容。

类似资料：

并行使用StanfordCoreNLP

假设我有+10000个句子，我想像这个例子一样分析。有可能并行处理这些和多线程吗？
StanfordCoreNLP不同于StanfordCoreNLPServer

java-mx3g-cp“*”edu.stanford.nlp.pipeline.stanfordcorenlp-props stanfordcorenlp-spanish.properties 第二个命令打开一个终端和西班牙语解析器工作正常，但从服务器版本来看，它使用英语解析器，而不是西班牙语。对于我使用的客户，wget--post-data'el presidente Julio Sanch
使用StanfordCoreNLP管道时的日期

如果我使用TokenizerNotator、WordsToSentencesAnnotator、POSTaggerAnnotator和sutime创建一个AnnotationPipeline，我会将TimexAnnotations附加到生成的注释上。但是，如果我创建一个StanfordCoreNLP管道，并将“annotators”属性设置为“tokenize，ssplit，pos，lemma，
Visual Studio， C#和StanfordCoreNLP问题

我的目标是测试这段代码，以确保斯坦福核心NLP安装正确。首先，我使用NuGet package manager安装了StanfordCOreNLP包，然后下载了一个zip文件，其中包含一个需要使用命令安装的jar文件，然后运行代码。在我得到一个错误，说：埃杜。斯坦福。nlp。木卫一。RuntimeIOException:加载标记器模型时出错（可能缺少模型文件）” 内部异常IOException
StanfordCoreNLP和语义图的性能问题

当我尝试使用斯坦福NLP和CoreNLP分析文本时，性能非常差。处理CNN的文件。com大约需要30秒。我拥有的代码基本上创建了具有以下配置的StanfordCoreNLP的单个实例：注释器=标记化、ssplit、pos、引理、ner、解析、dcoref sutime。活页夹=0 当我禁用“ner， parse， dcoref”时，性能非常快。由于我需要获取语义图，我想知道是否有一种方法可以优
调用StanfordCoreNLP API与MapReduce工作

我正在尝试使用MapReduce处理大量文档，其想法是在mapper中将文件拆分为文档，并在reducer阶段应用stanford coreNLP注释器。我有一个相当简单（标准）的“tokenize、ssplit、pos、lemma、ner”管道，reducer只调用一个函数，将这些注释器应用于reducer传递的值，并返回注释（作为字符串列表），但生成的输出是垃圾。我已经观察到，如果我从映射

带StanfordCoreNLP的Piglatin jodatime错误

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档