我在替换所有
String body = "<p>This is the output:</p>\n<pre class=\"lang-xml prettyprint prettyprinted\">\n<code><span class=\"dec\"><!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\"></span><span class=\"pln\">\n</span><span class=\"tag\"><HTML></span><span class=\"pln\">\n </span><span class=\"tag\"><HEAD></span><span class=\"pln\">\n </span><span class=\"tag\"><META</span><span class=\"pln\"> </span><span class=\"atn\">http-equiv</span><span class=\"pun\">=</span><span class=\"atv\">\"Content-Type\"</span><span class=\"pln\"> </span><span class=\"atn\">content</span><span class=\"pun\">=</span><span class=\"atv\">\"text/html; charset=iso-8859-1\"</span><span class=\"tag\">></span><span class=\"pln\">\n </span><span class=\"tag\"><TITLE></span><span class=\"pln\">GeteBayOfficialTime</span><span class=\"tag\"></TITLE></span><span class=\"pln\">\n </span><span class=\"tag\"></HEAD></span><span class=\"pln\">\n </span><span class=\"tag\"><BODY></span><span class=\"pln\">\n\n* About to connect() to api.ebay.com port 443 (#0)\n* Trying 66.135.211.100... * Timeout\n* Trying 66.135.211.140... * Timeout\n* Trying 66.211.179.150... * Timeout\n* Trying 66.211.179.180... * Timeout\n* Trying 66.135.211.101... * Timeout\n* Trying 66.211.179.148... * Timeout\n* connect() timed out!\n* Closing connection #0\n</span><span class=\"tag\"><P></span><span class=\"pln\">Error sending request</span></code></pre>";
log.info("printing before creating a Jsoup Doc "+ body);
Document bodyDom = Jsoup.parse(body);
log.info("printing after creating a Jsoup Doc "+ bodyDom.html());
Elements preTags = bodyDom.getElementsByTag("pre");
for (Element pre : preTags) {
pre.html(pre.html().replaceAll("(\r\n|\n)", "<br />"));
log.info("Pre element with linebreaks replaced -" + pre);
}
body = bodyDom.html();
这里是日志,似乎HTML源丢失了换行符,一旦我解析了JSoup文档。:
**2013-12-10 10:14:59 INFO FormattingTest:166** - printing before creating a Jsoup Doc <p>This is the output:</p>
<pre class="lang-xml prettyprint prettyprinted">
<code><span class="dec"><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"></span><span class="pln">
</span><span class="tag"><HTML></span><span class="pln">
</span><span class="tag"><HEAD></span><span class="pln">
</span><span class="tag"><META</span><span class="pln"> </span><span class="atn">http-equiv</span><span class="pun">=</span><span class="atv">"Content-Type"</span><span class="pln"> </span><span class="atn">content</span><span class="pun">=</span><span class="atv">"text/html; charset=iso-8859-1"</span><span class="tag">></span><span class="pln">
</span><span class="tag"><TITLE></span><span class="pln">GeteBayOfficialTime</span><span class="tag"></TITLE></span><span class="pln">
</span><span class="tag"></HEAD></span><span class="pln">
</span><span class="tag"><BODY></span><span class="pln">
* About to connect() to api.ebay.com port 443 (#0)
* Trying 66.135.211.100... * Timeout
* Trying 66.135.211.140... * Timeout
* Trying 66.211.179.150... * Timeout
* Trying 66.211.179.180... * Timeout
* Trying 66.135.211.101... * Timeout
* Trying 66.211.179.148... * Timeout
* connect() timed out!
* Closing connection #0
</span><span class="tag"><P></span><span class="pln">Error sending request</span></code></pre>
**2013-12-10 10:14:59 INFO FormattingTest:168** - printing after creating a Jsoup Doc <html>
<head></head>
<body>
<p>This is the output:</p>
<pre class="lang-xml prettyprint prettyprinted">
<code><span class="dec"><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"></span><span class="pln"> </span><span class="tag"><HTML></span><span class="pln"> </span><span class="tag"><HEAD></span><span class="pln"> </span><span class="tag"><META</span><span class="pln"> </span><span class="atn">http-equiv</span><span class="pun">=</span><span class="atv">"Content-Type"</span><span class="pln"> </span><span class="atn">content</span><span class="pun">=</span><span class="atv">"text/html; charset=iso-8859-1"</span><span class="tag">></span><span class="pln"> </span><span class="tag"><TITLE></span><span class="pln">GeteBayOfficialTime</span><span class="tag"></TITLE></span><span class="pln"> </span><span class="tag"></HEAD></span><span class="pln"> </span><span class="tag"><BODY></span><span class="pln"> * About to connect() to api.ebay.com port 443 (#0) * Trying 66.135.211.100... * Timeout * Trying 66.135.211.140... * Timeout * Trying 66.211.179.150... * Timeout * Trying 66.211.179.180... * Timeout * Trying 66.135.211.101... * Timeout * Trying 66.211.179.148... * Timeout * connect() timed out! * Closing connection #0 </span><span class="tag"><P></span><span class="pln">Error sending request</span></code></pre>
</body>
</html>
2013-12-10 10:14:59 INFO FormattingTest:174 - Pre element with linebreaks replaced - <pre class="lang-xml prettyprint prettyprinted"><code><span class="dec"><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"></span><span class="pln"> </span><span class="tag"><HTML></span><span class="pln"> </span><span class="tag"><HEAD></span><span class="pln"> </span><span class="tag"><META</span><span class="pln"> </span><span class="atn">http-equiv</span><span class="pun">=</span><span class="atv">"Content-Type"</span><span class="pln"> </span><span class="atn">content</span><span class="pun">=</span><span class="atv">"text/html; charset=iso-8859-1"</span><span class="tag">></span><span class="pln"> </span><span class="tag"><TITLE></span><span class="pln">GeteBayOfficialTime</span><span class="tag"></TITLE></span><span class="pln"> </span><span class="tag"></HEAD></span><span class="pln"> </span><span class="tag"><BODY></span><span class="pln"> * About to connect() to api.ebay.com port 443 (#0) * Trying 66.135.211.100... * Timeout * Trying 66.135.211.140... * Timeout * Trying 66.211.179.150... * Timeout * Trying 66.211.179.180... * Timeout * Trying 66.135.211.101... * Timeout * Trying 66.211.179.148... * Timeout * connect() timed out! * Closing connection #0 </span><span class="tag"><P></span><span class="pln">Error sending request</span></code></pre>
不确定出了什么问题。这是与另一个html源-"\n响应:\n一些thext\n\ndsjkhskjdh sdjhasjkdas\n"
正确地转换为-
Response :
some text
dsjkhskjdh sdjhasjkdas
不知道为什么第一个样本没有!!
问题是当您尝试执行此操作时:
Jsoup.parse("\nText\nNex").html();
你会得到:
text nex
从这些问题中,您可以执行以下操作:
Document bodyDom = Jsoup.parse(body.replaceAll("(\\r\\n|\\n)", "<br />"));
这就是在解析文档之前替换换行符。
对于仅替换两个pre
标记之间的换行符,请使用正则表达式提取它们并替换:
Pattern preP = Pattern.compile("<pre.*?>.+?</pre>", Pattern.DOTALL
| Pattern.CASE_INSENSITIVE);
Matcher m = preP.matcher(body);
while (m.find()) {
String toReplace = m.group();
String replaced = toReplace.replaceAll("(\r\n|\n)", "<br />");
body = body.replace(toReplace, replaced);
}
.*
是一个贪婪的限定符,它与/pre
的第一个外观相匹配,您可以尝试使用正则表达式,但这是不可能的,请参阅此答案以获得更好的解释。我建议您使用下一个选项。
您可以在这里看到正则表达式的示例。
从第二个ASNWER中,您可以使用:
Document.OutputSettings outputSettings = new Document.OutputSettings()
.prettyPrint(false);
body = Jsoup.clean(body, "", Whitelist.relaxed(), outputSettings);
之后(如原始代码):
pre.html(pre.html().replaceAll("(\r\n|\n)", "<br />"));
prettyPrint
选项使clean
方法退出换行符,然后解析器正确处理它
干杯
有没有人知道如何使用JSoup替换元素。我试图用按钮替换表格元素及其内容,但没有成功。代码尝试如下。这是一个Android项目
问题内容: 如何使用JavaScript从值读取换行符并将所有换行符替换为元素? 例: 从PHP传递的变量如下: 我希望我的结果在JavaScript转换后看起来像这样: 问题答案: 这会将所有退货转换为HTML 如果您想知道什么?:的意思。它称为非捕获组。这意味着括号内的正则表达式组不会保存在内存中,以后再引用。
我想改变HTML元素的文本内容,使其具有一定的背景色。HTML的格式如下 我有像下面这样需要匹配的关键字: 我有字符串形式的html 我想匹配元素文本内容,并在匹配HTML字符串时用关键字替换它们。我会改变他们的跨度有给定的背景颜色和匹配关键字的文本。 生成的HTML如下所示。 如何用java实现它。我正在使用jsoup库。 这个代码对我有用。这是最佳方法吗?。或者有没有更好的替代html字符串的
您好,我已经尝试了以下答案:如何使用jsoup替换标记,以及如何使用jsoup替换HTML标记,但都没有成功。我正在用JSoup解析一个网站,我运行了一个accross-letter-look GIF图像。幸运的是,这些gif图像有一个特定的名称,例如字母“a”的a.gif。 HTML输入: 期望输出: 我的java代码(以下)未打印预期输出: 谢谢你的帮助。
这是我的密码 我想替换字体标签,并把span标签。在这将取代第一个字体标签但不是第二个标签
问题内容: 我正在加载一个包含换行符的文本文件,并将其传递给。 用替换为已加载的字符串中的with ,它们会被模板转义为html 并显示在浏览器中,而不是引起换行。 如何更改此行为而无需切换到(没有XSS保护)? 问题答案: 看来您可以先在文本上运行template.HTMLEscape()进行净化,然后执行\ n 替换所信任的内容,然后将其用作预先转义和信任的模板数据。 更新:在Kocka的示例