我对HtmlUnit非常陌生,我正试图搜索一个使用Javascript编辑代码的网站。我听说HtmlUnit是最好的方法,因为它使用无头浏览器返回最终代码。
然而,正如您将看到的,我甚至无法通过创建HtmlPage对象而不得到一个巨大的、无法理解的异常抛出(至少考虑到我对HtmlUnit的实际零经验)。
下面是我的代码:
import com.gargoylesoftware.htmlunit.*;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
public class Main {
public static void main(String[] args) {
Main scraper = new Main();
scraper.testingGargoyle();
}
private void testingGargoyle() {
String myUrl = "https://www.wearvr.com/#game_id=game_4";
WebClient webClient = new WebClient();
try {
HtmlPage myPage = ((HtmlPage) webClient.getPage(myUrl));
} catch (FailingHttpStatusCodeException | IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
Apr 30, 2015 5:43:50 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
Apr 30, 2015 5:43:50 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[The data necessary to complete this operation is not yet available.] sourceName=[https://load.sumome.com/] line=[1] lineSource=[null] lineOffset=[0]
Exception in thread "main" ======= EXCEPTION START ========
EcmaError: lineNumber=[19] column=[0] lineSource=[<no source>] name=[TypeError] sourceName=[https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js] message=[TypeError: Cannot find function bind in object function (e, n, r) {...}. (https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js#19)]
com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot find function bind in object function (e, n, r) {...}. (https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js#19)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:847)
at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:620)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:513)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:733)
at com.gargoylesoftware.htmlunit.html.HtmlPage.loadExternalJavaScriptFile(HtmlPage.java:1096)
at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:395)
at com.gargoylesoftware.htmlunit.html.HtmlScript$3.execute(HtmlScript.java:270)
at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:290)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:793)
at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:751)
at org.cyberneko.html.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1170)
at org.cyberneko.html.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1072)
at org.cyberneko.html.filters.DefaultFilter.endElement(DefaultFilter.java:206)
at org.cyberneko.html.filters.NamespaceBinder.endElement(NamespaceBinder.java:330)
at org.cyberneko.html.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3126)
at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2093)
at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:920)
at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499)
at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:1017)
at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:248)
at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:194)
at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:268)
at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:156)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:471)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:345)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:410)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:395)
at Main.testingGargoyle(Main.java:19)
at Main.main(Main.java:10)
Caused by: net.sourceforge.htmlunit.corejs.javascript.EcmaError: TypeError: Cannot find function bind in object function (e, n, r) {...}. (https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js#19)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3629)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3613)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.typeError(ScriptRuntime.java:3634)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.typeError2(ScriptRuntime.java:3650)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.notFunctionError(ScriptRuntime.java:3714)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.getPropFunctionAndThisHelper(ScriptRuntime.java:2233)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.getPropFunctionAndThis(ScriptRuntime.java:2215)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1333)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:798)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:411)
at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:309)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3057)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:115)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$3.doRun(JavaScriptEngine.java:724)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:832)
... 31 more
Enclosed exception:
net.sourceforge.htmlunit.corejs.javascript.EcmaError: TypeError: Cannot find function bind in object function (e, n, r) {...}. (https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js#19)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3629)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3613)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.typeError(ScriptRuntime.java:3634)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.typeError2(ScriptRuntime.java:3650)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.notFunctionError(ScriptRuntime.java:3714)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.getPropFunctionAndThisHelper(ScriptRuntime.java:2233)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.getPropFunctionAndThis(ScriptRuntime.java:2215)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1333)
at script(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:19)
at script.r(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:16)
at script.r(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:384)
at script(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:16)
at script(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:16)
at script.t(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:1)
at script(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:16)
at script(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:16)
at script.t(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:1)
at script(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:7)
at script.t(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:1)
at script(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:463)
at script(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:463)
at script.t(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:1)
at script(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:1)
at script.t(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:1)
at script(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:1)
at script(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:1)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:798)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:411)
at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:309)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3057)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:115)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$3.doRun(JavaScriptEngine.java:724)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:832)
at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:620)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:513)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:733)
at com.gargoylesoftware.htmlunit.html.HtmlPage.loadExternalJavaScriptFile(HtmlPage.java:1096)
at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:395)
at com.gargoylesoftware.htmlunit.html.HtmlScript$3.execute(HtmlScript.java:270)
at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:290)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:793)
at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:751)
at org.cyberneko.html.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1170)
at org.cyberneko.html.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1072)
at org.cyberneko.html.filters.DefaultFilter.endElement(DefaultFilter.java:206)
at org.cyberneko.html.filters.NamespaceBinder.endElement(NamespaceBinder.java:330)
at org.cyberneko.html.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3126)
at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2093)
at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:920)
at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499)
at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:1017)
at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:248)
at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:194)
at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:268)
at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:156)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:471)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:345)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:410)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:395)
at Main.testingGargoyle(Main.java:19)
at Main.main(Main.java:10)
======= EXCEPTION END ========
我告诉过你它很大。我如何才能绕过这一点,并获得这个页面的最终来源,以便得到刮痧?
提前感谢!
异常是由于以下几个原因引发的:错误的html、脚本页面上的错误、未找到此类css的资源、脚本文件或图像文件(例如<-bla.gif not found HTML404)
因此,我们使用这些选项来保持html navegation,而不会在我们使用的第一个错误/问题上停止:
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
您还可以实现空类来阻止htmlUnity在控制台详细讲述CSS/JavaScript错误:
webClient.setCssErrorHandler(new SilentCssErrorHandler());
webClient.setJavaScriptErrorListener(new JavaScriptErrorListener(){});
@Test
public void TestCall() throws FailingHttpStatusCodeException, MalformedURLException, IOException {
WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setUseInsecureSSL(true); //ignore ssl certificate
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
String url = "https://www.wearvr.com/#game_id=game_4";
HtmlPage myPage = webClient.getPage(url);
webClient.waitForBackgroundJavaScriptStartingBefore(200);
webClient.waitForBackgroundJavaScript(20000);
//do stuff on page ex: myPage.getElementById("main")
//myPage.asXml() <- tags and elements
System.out.println(myPage.asText());
}
问题内容: 这个问题已经被问过了,但是我想API改变了,答案不再有效。 无法完成,因为TopLevelWindow受保护,并且诸如扩展/实现窗口之类的事情很荒唐:) 有人知道该怎么做吗?在我看来,这很难做到。 问题答案: 该代码在GroovyConsole中有效
我不知道如何解决这个问题。。。你能帮帮我吗? 我得到这个exeption: 我的代码是: 导入com.gargoylesoftware.htmlunit.WebClient; WebClient wb=new WebClient();}
JavaScript对每个创建的对象都会设置一个原型,指向它的原型对象。 当我们用obj.xxx访问一个对象的属性时,JavaScript引擎先在当前对象上查找该属性,如果没有找到,就到其原型对象上找,如果还没有找到,就一直上溯到Object.prototype对象,最后,如果还没有找到,就只能返回undefined。 例如,创建一个Array对象: var arr = [1, 2, 3]; 其
问题内容: 如何使用Google Gson创建json对象?以下代码创建一个看起来像的json对象 如何创建像这样的jSon对象? 问题答案: 弄清楚了如何使用Java对象正确执行此操作。 Creator java类的实现。
问题内容: 我正在尝试学习python,现在我试图摆脱类的困扰,以及如何使用实例操作它们。 我似乎无法理解这个练习问题: 创建并返回其名称,年龄和专业与输入的对象相同的学生对象 我只是不明白对象的含义,是否意味着我应该在包含这些值的函数内创建一个数组?或创建一个类,然后将该函数放入其中并分配实例?(在问这个问题之前,我被要求开设一个学生班,里面要写姓名,年龄和专业) 问题答案: 请注意,即使Pyt
当我创建GET响应时,出现了Stackoveflow错误 应答控制器 他用AjaxResponseBody类作为答案 当这个控制器工作时,我捕捉到 我是如何理解这种情况的,因为模型用户和模型项目彼此有链接。模型用户有一个可选字段“监视的项目”。 并且模型项目具有字段,而不是empriy字段“author”: 我怎么能放弃它?还是其他方式?