将选项卡式文本转换为html无序列表？

缪英锐

2023-03-14

问题内容：

我是一个初学者，所以这个问题听起来很简单：我有一些包含制表符分隔文本的文本文件，例如：

现在，我想使用此结构生成无序的.html列表：

<ul>
<li>A
<ul><li>B</li>
<li>C
<ul><li>D</li>
<li>E</li></ul></li></ul></li>
</ul>

我的想法是编写Python脚本，但是如果有更简单的（自动）方式，那也很好。为了识别缩进级别和项目名称，我将尝试使用以下代码：

import sys
indent = 0
last = []
for line in sys.stdin:
    count = 0
    while line.startswith("\t"):
       count += 1
       line = line[1:]
    if count > indent:
       indent += 1
       last.append(last[-1])
    elif count < indent:
       indent -= 1
       last = last[:-1]

问题答案：

tokenize模块了解您的输入格式：行包含有效的Python标识符，语句的缩进级别很重要。ElementTree模块允许您操纵内存中的树结构，因此可以更灵活地将树创建与将其渲染为html分开：

from tokenize import NAME, INDENT, DEDENT, ENDMARKER, NEWLINE, generate_tokens
from xml.etree import ElementTree as etree

def parse(file, TreeBuilder=etree.TreeBuilder):
    tb = TreeBuilder()
    tb.start('ul', {})
    for type_, text, start, end, line in generate_tokens(file.readline):
        if type_ == NAME: # convert name to <li> item
            tb.start('li', {})
            tb.data(text)
            tb.end('li')
        elif type_ == NEWLINE:
            continue
        elif type_ == INDENT: # start <ul>
            tb.start('ul', {})
        elif type_ == DEDENT: # end </ul>
            tb.end('ul')
        elif type_ == ENDMARKER: # done
            tb.end('ul') # end parent list
            break
        else: # unexpected token
            assert 0, (type_, text, start, end, line)
    return tb.close() # return root element

提供任何类.start()，.end()，.data()，.close()方法可以用作TreeBuilder例如，你可以只写HTML上飞的，而不是建立一个树。

要解析标准输入并将html写入标准输出，可以使用ElementTree.write()：

import sys

etree.ElementTree(parse(sys.stdin)).write(sys.stdout, method='html')

输出：

<ul><li>A</li><ul><li>B</li><li>C</li><ul><li>D</li><li>E</li></ul></ul></ul>

您可以使用任何文件，而不仅仅是sys.stdin/sys.stdout。

注意：要在Python 3上写入stdout，请使用sys.stdout.buffer或encoding="unicode"由于字节/
Unicode的不同而不同。

将选项卡式文本转换为html无序列表？

相关阅读

相关文章

相关问答

相关工具

相关文档