问题：

如何在标题之间获取数据？

江新

2023-03-14

我是xslt新手。我希望将以下输入转换为如下所示的输出：

输入：

<ATTRIBUTE-VALUE>
    <THE-VALUE>
        <div xmlns="http://www.w3.org/1999/xhtml">
            <h1 dir="ltr" id="_1536217498885">Main Description</h1>
            Line1 The main description text goes here.
            <p>Line2 The main description text goes here.</p>
            &lt;p&gt;Line3 The main description text goes here.&lt;/p&gt;
            <p><img alt="Embedded Image" class="embeddedImageLink" id="_1536739954166" src="_9c3778a0-d596-4eef-85fa-052a5e1b2166.jpg"/></p>
            <h1 dir="ltr" id="_1536217498886">Key Consideration</h1>
            <p>Line1 The key consideration text goes here.</p>
            <p>Line2 The key consideration text goes here.</p>
            <h1 dir="ltr" id="_1536217498887">Skills</h1>
            <p>Line1 The Skills text goes here.</p>
            <p>Line2 The Skills text goes here.</p>
            <p>Line3 The Skills text goes here.</p>
            <h1 dir="ltr" id="_1536217498888">Synonyms</h1>
            &lt;p&gt;The Synonyms text goes here.&lt;/p&gt;
        </div>
    </THE-VALUE>
</ATTRIBUTE-VALUE>

输出应为：

<MainDescription>
    <![CDATA[
        <p>Line1 The main description text goes here.</p>
        <p>Line2 The main description text goes here.</p>
        <p>Line3 The main description text goes here.</p>
        <p><img alt="Embedded Image" class="embeddedImageLink" id="_1536739954166" src="_9c3778a0-d596-4eef-85fa-052a5e1b2166.jpg"/></p>
    ]]>
</MainDescription>
<KeyConsiderations>
    <![CDATA[
        <p>Line1 The key consideration text goes here.</p>
        <p>Line2 The key consideration text goes here.</p>
    ]]>
</KeyConsiderations>
<Skills>
    <p>Line1 The Skills text goes here.</p>
    <p>Line2 The Skills text goes here.</p>
    <p>Line3 The Skills text goes here.</p>
</Skills>
<Synonyms>
    <p>The Synonyms text goes here.</p>
</Synonyms>

我想要<代码>

XSL代码：

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xhtml="http://www.w3.org/1999/xhtml"
    xmlns:exsl="http://exslt.org/common"
    exclude-result-prefixes="xhtml exsl"
    version="1.0">

  <xsl:import href="http://lenzconsulting.com/xml-to-string/xml-to-string.xsl"/>

  <xsl:output method="xml" indent="yes"
    cdata-section-elements="MainDescription KeyConsideration"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="/">
      <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:key name="h1-group" match="xhtml:div/*[not(self::xhtml:h1)]" use="generate-id(preceding-sibling::xhtml:h1[1])"/>

  <xsl:template match="xhtml:div[xhtml:h1]">
      <xsl:apply-templates select="xhtml:h1"/>
  </xsl:template>

  <xsl:template match="xhtml:h1">
      <xsl:element name="{translate(., ' ', '')}">
          <xsl:variable name="rtf-with-xhtml-ns-stripped">
              <xsl:apply-templates select="key('h1-group', generate-id())"/>
          </xsl:variable>
          <xsl:apply-templates select="exsl:node-set($rtf-with-xhtml-ns-stripped)/node()" mode="xml-to-string"/>
      </xsl:element>
  </xsl:template>

  <xsl:template match="xhtml:p">
      <p>
          <xsl:apply-templates/>
      </p>
  </xsl:template>

</xsl:stylesheet>

我得到的输出为：

<ATTRIBUTE-VALUE>
  <THE-VALUE>
    <MainDescription><![CDATA[<p>Line2 The main description text goes here.</p><p><img alt="Embedded Image" class="embeddedImageLink" id="_1536739954166" src="_9c3778a0-d596-4eef-85fa-052a5e1b2166.jpg" xmlns="http://www.w3.org/1999/xhtml"/></p>]]></MainDescription>
    <KeyConsideration><![CDATA[<p>Line1 The key consideration text goes here.</p><p>Line2 The key consideration text goes here.</p>]]></KeyConsideration>
    <Skills>&lt;p&gt;Line1 The Skills text goes here.&lt;/p&gt;&lt;p&gt;Line2 The Skills text goes here.&lt;/p&gt;&lt;p&gt;Line3 The Skills text goes here.&lt;/p&gt;</Skills>
    <Synonyms />
  </THE-VALUE>
</ATTRIBUTE-VALUE>

共有1个答案

萧懿轩

2023-03-14

如果您将代码更改为在node（）而不是*上匹配元素，您将获得包含的文本节点：

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xhtml="http://www.w3.org/1999/xhtml"
    xmlns:exsl="http://exslt.org/common"
    exclude-result-prefixes="xhtml exsl"
    version="1.0">

  <xsl:import href="http://lenzconsulting.com/xml-to-string/xml-to-string.xsl"/>

  <xsl:output method="xml" indent="yes"
    cdata-section-elements="MainDescription KeyConsideration"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="/">
      <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:key name="h1-group" match="xhtml:div/node()[not(self::xhtml:h1)]" use="generate-id(preceding-sibling::xhtml:h1[1])"/>

  <xsl:template match="xhtml:div[xhtml:h1]">
      <xsl:apply-templates select="xhtml:h1"/>
  </xsl:template>

  <xsl:template match="xhtml:h1[. = 'Main Description' or . = 'Key Consideration']">
      <xsl:element name="{translate(., ' ', '')}">
          <xsl:variable name="rtf-with-xhtml-ns-stripped">
              <xsl:apply-templates select="key('h1-group', generate-id())"/>
          </xsl:variable>
          <xsl:apply-templates select="exsl:node-set($rtf-with-xhtml-ns-stripped)/node()" mode="xml-to-string"/>
      </xsl:element>
  </xsl:template>

  <xsl:template match="xhtml:h1">
      <xsl:element name="{translate(., ' ', '')}">
          <xsl:variable name="rtf-with-xhtml-ns-stripped">
              <xsl:apply-templates select="key('h1-group', generate-id())"/>
          </xsl:variable>
          <xsl:apply-templates select="exsl:node-set($rtf-with-xhtml-ns-stripped)/node()"/>
      </xsl:element>
  </xsl:template>

  <xsl:template match="text()">
      <xsl:value-of select="." disable-output-escaping="yes"/>
  </xsl:template>

  <xsl:template match="xhtml:p">
      <p>
          <xsl:apply-templates/>
      </p>
  </xsl:template>

</xsl:stylesheet>

https://xsltfiddle.liberty-development.net/bdxtqy/41

不清楚何时/何处要换行纯文本，如主描述文本在此处到元素中。

对于CDATA部分，使用disable-outout-逃逸我认为您需要覆盖导入的xml-to-string样式表的text（）节点的模板：

  <xsl:template match="text()" mode="xml-to-string">
      <xsl:value-of select="." disable-output-escaping="yes"/>
  </xsl:template>

https://xsltfiddle.liberty-development.net/bdxtqy/42

我还没有测试过这是否会破坏任何东西。

类似资料：

Jsoup：如何获取2个标题标签之间的所有html

问题内容：我正在尝试获取2 h1标签之间的所有html。实际的任务是根据h1（heading 1）标签将html分成几帧。感谢任何帮助。谢谢苏尼尔问题答案：如果要获取和处理两个连续标签之间的所有元素，则可以处理同级对象。这是一些示例代码：
如何获取标题授权

我不知道为什么我不能从头AUTHORIZATION中获得值，因为我在Postman（从服务器返回）中看到。 http://img110.xooimage.com/files/1/6/9/postman-567005e.png 我尝试了很多方法，但不知道为什么仍然得到空值。 http://img110.xooimage.com/files/b/c/f/debug-5670075.png 这是我的代码
Jsoup在两个标记之间获取html

在像这样的网站上http://wikitravel.org/en/San_Francisco，诸如“Districts”、“Understand”、“Get in”等部分实际上并不包含HTML中的整个部分。节实际上只是标题中的跨类。正因为如此，我们不能简单地通过选择id来获取wiki文档的某些部分。但是，是否可以收集两个标记之间的所有html？比如说我想要“四处走动”部分。我该如何发出一个选择器
Jsoup从两个标记之间的html获取数据

我正在从事一个个人项目，希望解析这个html并从中检索信息。基本上，我希望获得标记中给出的所有信息，为此，我在java中使用JSOUP。我使用这段代码来获取，但这是在一个段落中给出所有值。我也试过了但他的观点是空泛的。有人能帮我以更好的方式获得这些数据吗？
如何获取LocalDateTime实例之间的天数，忽略时间

我差点用LocalDateTime引起一个在线bug，直到，例如：我最初认为他应该2天回来，但结果是1天！然后我看了相应的源代码，我现在似乎明白了：如果超过一天，少于两天，它只会返回一天。我想问一下，Java或Spring中是否有满足我需求的工具。我想让它回到2天，这符合人们的直觉。当然，我可以包装一个实用程序类来实现这一点，但我想知道Java中是否有可用的实现？我的业务场景是这样的：
使用jsoup或regex在标题标记之间提取html标记

嗨，我有一个html文件解析的场景。我正在使用jsoup解析html文件，解析后我想提取头标记（h1、h3、h4）。我用过医生。select（）但它将只返回标题标记值，但我的要求是我应该提取h1到h3或h4之间的标记，反之亦然。所以这里首先搜索html字符串是否包含任何H1，H3，H4。这里我们有h4，所以包括h4，它应该搜索下一个h1或h3，直到h3我们提取字符串并把它放在一个单独的html文

如何在标题之间获取数据？

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档