当前位置: 首页 > 知识库问答 >
问题:

JavaHTMLUnit WebClient ScriptException错误

莘睿
2023-03-14

我正在使用HTMLUnit创建scrape网站。我正在使用htmlunit-2.19版本。我知道这是一个重复的问题,但相信我,我尝试了我在谷歌找到的所有解决方案,但我仍然得到了这个例外。请参见以下例外情况

com.gargoylesoftware.htmlunit.ScriptException: ReferenceError: "jQuery" is not defined. (URL/lib/dropdown/core.js#3)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:954) [htmlunit-2.19.jar:2.19]
    at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:628) [htmlunit-core-js-2.17.jar:na]
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:513) [htmlunit-core-js-2.17.jar:na]
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:836) [htmlunit-2.19.jar:2.19]
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:812) [htmlunit-2.19.jar:2.19]
    at com.gargoylesoftware.htmlunit.html.HtmlPage.loadExternalJavaScriptFile(HtmlPage.java:997) [htmlunit-2.19.jar:2.19]
    at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:399) [htmlunit-2.19.jar:2.19]
    at com.gargoylesoftware.htmlunit.html.HtmlScript$3.execute(HtmlScript.java:277) [htmlunit-2.19.jar:2.19]
    at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:293) [htmlunit-2.19.jar:2.19]
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:799) [htmlunit-2.19.jar:2.19]
    at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source) [xercesImpl-2.11.0.jar:na]
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:756) [htmlunit-2.19.jar:2.19]
    at org.cyberneko.html.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1170) [nekohtml-1.9.22.jar:1.9.22]
    at org.cyberneko.html.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1072) [nekohtml-1.9.22.jar:1.9.22]
    at org.cyberneko.html.filters.DefaultFilter.endElement(DefaultFilter.java:206) [nekohtml-1.9.22.jar:na]
    at org.cyberneko.html.filters.NamespaceBinder.endElement(NamespaceBinder.java:330) [nekohtml-1.9.22.jar:na]
    at org.cyberneko.html.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3126) [nekohtml-1.9.22.jar:1.9.22]
    at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2093) [nekohtml-1.9.22.jar:1.9.22]
    at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:920) [nekohtml-1.9.22.jar:1.9.22]
    at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499) [nekohtml-1.9.22.jar:1.9.22]
    at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452) [nekohtml-1.9.22.jar:1.9.22]
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) [xercesImpl-2.11.0.jar:na]
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:1039) [htmlunit-2.19.jar:2.19]
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:252) [htmlunit-2.19.jar:2.19]
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:198) [htmlunit-2.19.jar:2.19]
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:271) [htmlunit-2.19.jar:2.19]
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:159) [htmlunit-2.19.jar:2.19]
    at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:478) [htmlunit-2.19.jar:2.19]
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:352) [htmlunit-2.19.jar:2.19]
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:417) [htmlunit-2.19.jar:2.19]
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:402) [htmlunit-2.19.jar:2.19]
    at com.company.dashboard.service.impl.ReverseServiceImpl.loginToAds(ReverseServiceImpl.java:447) [classes/:na]
    at com.company.dashboard.service.impl.ReverseServiceImpl.loginToAds(ReverseServiceImpl.java:462) [classes/:na]
    at com.company.dashboard.service.impl.ReverseServiceImpl.getKeyword(ReverseServiceImpl.java:502) [classes/:na]
    at com.company.dashboard.service.impl.ReverseServiceImpl.handleReverseBySetting(ReverseServiceImpl.java:879) [classes/:na]
    at com.company.dashboard.thread.ConCurrentRunnable.run(ConCurrentRunnable.java:44) [classes/:na]
    at com.company.dashboard.thread.CustomThreadPool$WorkerThread.run(CustomThreadPool.java:53) [classes/:na]
Caused by: net.sourceforge.htmlunit.corejs.javascript.EcmaError: ReferenceError: "jQuery" is not defined. (URL/lib/dropdown/core.js#3)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3935) ~[htmlunit-core-js-2.17.jar:na]
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3919) ~[htmlunit-core-js-2.17.jar:na]
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.notFoundError(ScriptRuntime.java:3996) ~[htmlunit-core-js-2.17.jar:na]
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.name(ScriptRuntime.java:1846) ~[htmlunit-core-js-2.17.jar:na]
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1627) ~[htmlunit-core-js-2.17.jar:na]
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:798) ~[htmlunit-core-js-2.17.jar:na]
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105) ~[htmlunit-core-js-2.17.jar:na]
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:411) [htmlunit-core-js-2.17.jar:na]
    at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:309) ~[htmlunit-2.19.jar:2.19]
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3286) ~[htmlunit-core-js-2.17.jar:na]
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:115) ~[htmlunit-core-js-2.17.jar:na]
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$3.doRun(JavaScriptEngine.java:827) ~[htmlunit-2.19.jar:2.19]
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:939) [htmlunit-2.19.jar:2.19]
    ... 36 common frames omitted

2019-07-13 11:06:01.078  INFO 5686 --- [       Thread-4] c.g.h.javascript.JavaScriptEngine        : Caught script exception

我在谷歌上研究过,我发现了很多关于这个异常的解决方案,我尝试过所有的解决方案,但没有一个解决方案是有效的。

请参阅下面的解决方案,我已经应用

解决方案1

WebClient webClient= new WebClient(BrowserVersion.FIREFOX_38);

        webClient.setIncorrectnessListener(new IncorrectnessListener() {

            @Override
            public void notify(String message, Object origin) {
                // TODO Auto-generated method stub

            }

        });
        webClient.setCssErrorHandler(new SilentCssErrorHandler() {

        });
        webClient.setJavaScriptErrorListener(new JavaScriptErrorListener() {

            @Override
            public void scriptException(InteractivePage page,
                    ScriptException scriptException) {
                // TODO Auto-generated method stub

            }

            @Override
            public void timeoutError(InteractivePage page, long allowedTime,
                    long executionTime) {
                // TODO Auto-generated method stub

            }

            @Override
            public void malformedScriptURL(InteractivePage page, String url,
                    MalformedURLException malformedURLException) {
                // TODO Auto-generated method stub

            }

            @Override
            public void loadScriptError(InteractivePage page, URL scriptUrl,
                    Exception exception) {
                // TODO Auto-generated method stub

            }

        });
        webClient.setHTMLParserListener(new HTMLParserListener() {

            @Override
            public void error(String message, URL url, String html, int line,
                    int column, String key) {
                // TODO Auto-generated method stub

            }

            @Override
            public void warning(String message, URL url, String html, int line,
                    int column, String key) {
                // TODO Auto-generated method stub

            }

        });

解决方案2:

   webClient.getOptions().setCssEnabled(false);
   webClient.getOptions().setJavaScriptEnabled(true);
   webClient.getOptions().setThrowExceptionOnScriptError(false);            
   webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);      
   webClient.getOptions().setPrintContentOnFailingStatusCode(false);

我找到了其他解决方案,它们是setJavaScriptEnabled(false),但我需要启用JS。如果没有启用JS,我就无法刮取站点。所以我必须设置enableJS。

请让我知道我的代码中是否缺少代码?


共有1个答案

山翼
2023-03-14

在不了解页面和更多关于您的代码的详细信息的情况下,我只能尝试提供一些建议

  • 你的HtmlUnit版本真的过时了(2.19是从2015年11月12日开始的),我们现在是2.35.0。请使用最新的......
  • 检查真正浏览器的浏览器日志,看看是否也有错误
  • webClient.getOptions(). setThrowExceptionOnScriptError(false);将HtmlUnit的行为更改为在检测到未处理的js异常时不抛出异常。这或多或少与真实浏览器处理js异常的方式相同。但是(与真实浏览器相当)HtmlUnit仍然记录此异常。如果您不喜欢了解此问题,则必须配置记录器。
 类似资料:
  • 我正在尝试搜索亚马逊的产品广告,并使用botlenose来帮助我做到这一点。但是,我刚刚收到HTTP错误400。 其他一些重要信息: 我来自巴西,我的标签也来自亚马逊。这是个问题吗? 我确实检查了我的钥匙、秘密和标签,一切正常。我确实在StackOverflow上查看了其他一些问题,但对我来说没有任何效果。 当然,出于安全原因,我更改了密钥。 Traceback(最近一次调用最后一次):File"

  • 我有一个基于Spring Web model view controller(MVC)框架的项目。Spring Web模型-视图-控制器(MVC)框架的版本是3.2.8 我有这个控制器 这个URL一切正常:

  • 目前从Angular JS controller中,我试图将JSON数据发送到后端服务。但是我有400个错误的请求错误。 在Controller中,我试图通过http服务发送数据,如下所示:

  • 我得到了这个错误,有什么想法会导致它吗?我试图发送一个DTO,它有一个扩展抽象类的对象列表,我想这个问题可能是因为DTO中的列表,还是因为抽象类的子类?

  • 在月食中, ”org.apache.axis2。AxisFault:传输错误: 403错误:禁止”试图从svn检出项目时发生错误。我不能实现这个错误,因此我检查了从终端使用"svn-co"命令的项目。 但是,有趣的是,当我试图在Eclipse中运行应用程序时,在输入凭据(用户名和密码)并按下“登录”按钮之后,我又遇到了相同的错误。响应是JFrame上的无效用户名/密码,但凭据没有错误。这只发生在日

  • Errors 错误 Library routines must often return some sort of error indication to the caller. As mentioned earlier, Go’s multivalue return makes it easy to return a detailed error description alongside th

  • 本章概述了Google API错误模型,以及开发人员如何正确生成和处理错误的一般指南。 Google API使用简单的协议无关错误模型,这使我们能够在不同的API,API协议(如gRPC或HTTP)以及错误上下文(例如,异步,批处理或工作流错误)中获得一致的体验。 错误模型 错误模型在逻辑上由google.rpc.Status定义,当API发生错误时,返回一个Status实例给客户端。 以下代码段

  • 5.4. 错误 在Go中有一部分函数总是能成功的运行。比如strings.Contains和strconv.FormatBool函数,对各种可能的输入都做了良好的处理,使得运行时几乎不会失败,除非遇到灾难性的、不可预料的情况,比如运行时的内存溢出。导致这种错误的原因很复杂,难以处理,从错误中恢复的可能性也很低。 还有一部分函数只要输入的参数满足一定条件,也能保证运行成功。比如time.Date函数