问题：

绘制具有可变子节点数的树？

宋志学

2023-03-14

我希望生成一个可视化xml文件结构的图形。

我创建了一个节点列表来表示xml文件
每个节点包含3个字符串：xml标记、属性和内容。

xml 文件如下所示：

<?xml version="1.0" encoding="UTF-8"?>
<entry db="genbank">
   <data id="AC116785" length="132912" molecule="DNA" data_class="linear" division="HTG" date="08-JUL-2002" />
   <definition>
      <description>Mus musculus clone RP24-146B1, WORKING DRAFT SEQUENCE, 10 ordered pieces.</description>
   </definition>
   <accession>AC116785</accession>
   <version>
      <version_number>AC116785.3</version_number>
      <gi>21703640</gi>
   </version>
   <keywords>
      <keyword>HTG</keyword>
      <keyword>HTGS_PHASE2</keyword>
      <keyword>HTGS_DRAFT</keyword>
      <keyword>HTGS_FULLTOP</keyword>
   </keywords>
   <source>
      <abbreviation>house mouse.</abbreviation>
      <organism>
         <name>Mus musculus</name>
         <taxonomy>
            <class>Eukaryota</class>
            <class>Metazoa</class>
            <class>Chordata</class>
            <class>Craniata</class>
            <class>Vertebrata</class>
            <class>Euteleostomi</class>
            <class>Mammalia</class>
            <class>Eutheria</class>
            <class>Rodentia</class>
            <class>Sciurognathi</class>
            <class>Muridae</class>
            <class>Murinae</class>
            <class>Mus</class>
         </taxonomy>
      </organism>
   </source>
   <references>
      <reference number="1" from="1" to="132912">
         <authors>
            <author>Birren,B.</author>
         </authors>
         <title>Mus musculus, clone RP24-146B1</title>
         <journal>
            <location>Unpublished</location>
         </journal>
      </reference>
      <reference number="2" from="1" to="132912">
         <authors>
            <author>Birren,B.</author>
         </authors>
         <title>Direct Submission</title>
         <journal>
            <submission>02-APR-2002</submission>
            <department>Whitehead Institute/MIT Center for Genome Research, 320 Charles Street, Cambridge, MA 02141, USA</department>
         </journal>
      </reference>
      <reference number="3" from="1" to="132912">
         <authors>
            <author>Birren,B.</author>
         </authors>
         <title>Direct Submission</title>
         <journal>
            <submission>08-JUL-2002</submission>
            <department>Whitehead Institute/MIT Center for Genome Research, 320 Charles Street, Cambridge, MA 02141, USA</department>
         </journal>
      </reference>
   </references>
   <comment>
      <replaced>
         <date>Jul 8, 2002</date>
         <gi>21700645</gi>
      </replaced>
      <information title="All repeats were identified using RepeatMasker">Smit, A.F.A. ,  Green, P. (1996-1997)http://ftp.genome.washington.edu/RM/RepeatMasker.html</information>
      <information title="Center">Whitehead Institute/ MIT Center for Genome Research</information>
      <information title="Center code">WIBR</information>
      <information title="Web site">http://www-seq.wi.mit.edu</information>
      <information title="Contact">sequence_submissions@genome.wi.mit.edu</information>
      <information title="Center project name">L25104</information>
      <information title="Center clone name">146_B_1</information>
      <information title="Sequencing vector">Plasmid; n/a; 100% of reads</information>
      <information title="Chemistry">Dye-terminator Big Dye; 100% of reads</information>
      <information title="Assembly program">Phrap; version 0.960731</information>
      <information title="Consensus quality">130058 bases at least Q40</information>
      <information title="Consensus quality">131186 bases at least Q30</information>
      <information title="Consensus quality">131595 bases at least Q20</information>
      <information title="Insert size">142000; agarose-fp</information>
      <information title="Insert size">132012; sum-of-contigs</information>
      <information title="Quality coverage">6.9 in Q20 bases; agarose-fp</information>
      <information title="Quality coverage">7.5 in Q20 bases; sum-of-contigs</information>
      <information title="NOTE">This is a 'working draft' sequence. It currently consists of 10 contigs. Gaps between the contigsare represented as runs of N. The order of the piecesis believed to be correct as given, however the sizesof the gaps between them are based on estimates that haveprovided by the submittor.This sequence will be replacedby the finished sequence as soon as it is available andthe accession number will be preserved.</information>
      <information title="1     1178">contig of 1178 bp in length</information>
      <information title="1179 1278">gap of      100 bp</information>
      <information title="1279     2835">contig of 1557 bp in length</information>
      <information title="2836 2935">gap of      100 bp</information>
      <information title="2936     5385">contig of 2450 bp in length</information>
      <information title="5386 5485">gap of      100 bp</information>
      <information title="5486     8192">contig of 2707 bp in length</information>
      <information title="8193 8292">gap of      100 bp</information>
      <information title="8293    10488">contig of 2196 bp in length</information>
      <information title="10489 10588">gap of      100 bp</information>
      <information title="10589    12801">contig of 2213 bp in length</information>
      <information title="12802 12901">gap of      100 bp</information>
      <information title="12902    18716">contig of 5815 bp in length</information>
      <information title="18717 18816">gap of      100 bp</information>
      <information title="18817    34793">contig of 15977 bp in length</information>
      <information title="34794 34893">gap of      100 bp</information>
      <information title="34894    51004">contig of 16111 bp in length</information>
      <information title="51005 51104">gap of      100 bp</information>
      <information title="51105   132912">contig of 81808 bp in length.</information>
   </comment>
   <features>
      <sequence_feature type="source">
         <location>1..132912</location>
         <qualifer type="db_xref">taxon:10090</qualifer>
         <qualifer type="clone">RP24-146B1</qualifer>
         <qualifer type="clone_lib">RPCI-24 Male Mouse BAC</qualifer>
      </sequence_feature>
      <sequence_feature type="misc_feature">
         <location>1..1178</location>
      </sequence_feature>
      <sequence_feature type="misc_feature">
         <location>1279..2835</location>
      </sequence_feature>
      <sequence_feature type="misc_feature">
         <location>2936..5385</location>
      </sequence_feature>
      <sequence_feature type="misc_feature">
         <location>5486..8192</location>
      </sequence_feature>
      <sequence_feature type="misc_feature">
         <location>8293..10488</location>
      </sequence_feature>
      <sequence_feature type="misc_feature">
         <location>10589..12801</location>
      </sequence_feature>
      <sequence_feature type="misc_feature">
         <location>12902..18716</location>
      </sequence_feature>
      <sequence_feature type="misc_feature">
         <location>18817..34793</location>
      </sequence_feature>
      <sequence_feature type="misc_feature">
         <location>34894..51004</location>
      </sequence_feature>
      <sequence_feature type="misc_feature">
         <location>51105..132912</location>
      </sequence_feature>
   </features>
   <base_count num_a="43599" num_c="24512" num_g="23668" num_t="40195" num_others="938" />
   <sequence>mhkkiciigagaaglvsakhaikqgyqvdifeqtdqvggtwvysektgchsslykvmktn
lpkeamlfqdepfrdelpsfmshehvleylnefskdfpiqfsstvnevkrendlwkvlie
snsetitrfydvvfvcnghffeplnpyqnsyfkgklihshdyrraehytgknvvivgagp
sgiditlqiaqtanhvtliskkatypvlpesvqqmatnvksvdehgvvtdegdhvpadvi
ivctgyvfkfpfldssliqlkyndrmvsplyehlchvdypttlffiglplgtitfplfev
qvkyalsliagkgklpsddveirnfedarlqgllnpasfhviieeqweymkklakmggfe
ewnymetikklygyimterkknvigykmvnfelttdssdfklltirvdfnddvawiirfa
ypi</sequence>
</entry>

我希望通过枚举节点列表，使用Plotly和igraph库生成一个树形图。

我在这里使用这个网站作为参考。

我的XML文件包含子元素数量可变的元素。然而，给出的例子只向我展示了如何开发一个具有固定数量的子节点的树(这个例子展示了每个节点2个子节点的固定数量)

在这里查看igraph教程网站，我看到一个类似的例子，其中每个节点只使用2个子节点。

我应该如何生成一个具有可变数量子节点的树，比如在我的XML文件中？

我在这个问题上坚持了这么久，任何帮助都将不胜感激！

章稳

2023-03-14

您可以像这样创建图形：

from lxml import etree
from igraph import Graph
   
root = etree.parse("entry.xml").getroot()
 
element_ids = {elem: i for i, elem in enumerate(root.iter())}

edges = []
for parent, parent_id in element_ids.items():
    for child in parent.getchildren():
        edges.append((parent_id, element_ids[child]))

G = Graph(edges)

element_ids字典将包含 XML 中的所有标记作为键和所有元素的不同 ID，如 {tag1： 0，tag2： 1，tag3： 2}。这样，您稍后将找到所有标记的 id。

我不知道如何将标签放入plotly，但对于使用igraph绘图，将标签名称作为标签添加会很有用:

names = [e.tag for e in element_ids]
G.vs['label'] = names

我没有尝试过，但图形的可视化必须与文章中的相同。

绘制具有可变子节点数的树？

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档